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ABSTRACT 


This  thesis  has  two  parts,  both  related  to  the  develop¬ 
ment  of  smart  sensor  systems.  The  first  part  is  a  theoretical 
development  of  two  families  of  adaptive  spatial  filters  for 
suppressing  background  clutters  in  infrared  images  and  based 
on  the  minimization  of  mean  squared  error  or  the  maximization 
of  signal  to  noise  ratio  criterion.  Seven  different  nonlinear 
search  techniques  have  been  developed  for  the  adaptation  pro¬ 
cess.  They  have  been  applied  to  two  real  world  infrared  test 
images  and  exhibit  fast  convergence  rate  with  no  misadjust- 
ment.  The  second  part  is  an  experimental  development  of  a 
multiple  microcomputer  system  which  can  be  a  candidate  foT  an 
on-board  processor  system.  A  multiple  star,  multiple  cluster 
architecture  was  developed  whose  intercommunication  is  managed 
by  a  three  level  control  including  central  controller,  dis¬ 
tributed  controller  and  random  priority  controller.  The 
adaptive  spatial  filter  has  been  successfully  implemented  on 
this  system  using  partitioning  for  parallel  computing. 
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I.  INTRODUCTION 


A.  OBJECTIVES 

1 .  Dual  Objectives  of  this  Thesis 

This  thesis  consists  of  two  closely  related  studies. 

a.  The  first  study  is  the  theoretical  development 
of  adaptive  image  processing  algorithms  for  enhancement  of 
’’target  signal"  to  "clutter  noise"  ratio  in  images.  It  will 
be  used  in  the  first  step  of  a  multiple-stage  image  process¬ 
ing  program  for  detection  of  dim  targets  in  noisy  infrared 
images . 

b.  The  second  study  is  an  experimental  development 

of  a  multiple  microcomputer  system  for  implementation  of  these 
adaptive  image  processing  algorithms. 

These  two  studies  belong  to  two  different  technical 
areas.  Either  topic  could  be  the  subject  of  one  thesis  pro¬ 
ject.  However,  they  are  investigated  together  in  this  thesis 
because  of  the  special  nature  of  a  new  emerging  field  which 
inspired  the  research  undertaken  by  this  project.  This  new 
field  is  sometimes  known  as  the  "Smart  Sensors"  [1,  2,  3]. 

Its  developments  got  into  high  gear  only  in  the  late  1970’s 
when  advances  in  two  integrated  circuit  fields,  VLSI  digital 
electronics  and  mosaic  optical  sensor  arrays,  were  joined 
together  to  develop  new  optical  sensors  which  also  have 
sophisticated  on-board  signal/data  processing  capabilities. 
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In  other  words,  they  are  SMART- SENSORS.  Their  importance  is 
closely  associated  with  the  coexistence  of  "sensing"  and 


"processing"  capabilities  on  a  small  volume,  light  weight, 
low  power  platform.  Therefore,  the  successful  development 
of  "smart  sensor"  systems  includes  not  only  new  signal/data 
processing  algorithms  to  provide  the  needed  "smartness"  but 
also  efficient  implementation  by  signal/data  processors  whose 
size,  weight,  power  and-  performance  are  compatible  with  the 
requirements  of  on-board  equipment  in  many  practical  military 
systems . 

2.  Multi-Dimensional  "Smart  Sensor"  Signal  Processing 

In  most  optical  smart  sensor  systems,  signals  of 
interest  are  in  the  form  of  images.  If  the  field  of  view 
of  the  sensor  platform  is  not  stabilized,  or  locked  onto  a 
target,  successive  frames  of  images  are  not  registered. 

Signal  processing  can  only  use  single  frames  of  an  image. 
Therefore,  the  signal  is  two  dimensional  in  terms  of  the 
spatial  variables  x  and  y.  If  sensors  in  several  spectral 
bands  are  available  and  well  registered  spatially  the  sig¬ 
nals  are  three  dimensional  in  terms  of  variables,  x,  y  and  A. 

In  many  other  smart  sensor  systems,  the  field  of 
view  of  the  sensor  platform  either  does  not  change  (as  in 
a  synchronous  orbit  satellite  with  staring  sensors)  or  is 
stabilized  (as  in  aircraft  with  step-staring  sensors)  or 
is  locked  onto  a  target  (as  in  missiles  after  they  have  al¬ 
ready  acquired  a  target).  In  these  cases,  successive  images 


are  registered.  Both  single  frames  of  images  and  multiple 
frames  of  images  are  available  for  signal  processing.  The 
signal  is  then  three  dimensional  in  terms  of  x,  y  and  t.  In 
addition,  if  multi-spectral  sensors  are  registered,  the  signal 
is  four  dimensional  in  terms  of  x,  y,  t  and  X. 

Therefore,  signal  processing  operations  required  for 
smart  sensors  are  often  multi-dimensional.  This  thesis  is 
concerned  with  adaptive  spatial  filters  processing  infrared 
images.  This  type  of  spatial  filter  should  be  distinguished 
from  the  majority  of  image  processing  methods  which  are  con¬ 
cerned  with  the  image  itself  as  the  signal  of  interest. 

Our  primary  goal  is  concentrated  in  the  targets.  The  image 
itself,  often  called  the  background  clutter,  is  considered 
as  noise  and  must  be  suppressed  so  that  dim  target  signals 
can  be  revealed  to  allow  the  application  of  a  threshold  to 
initiate  the  detection  process.  In  addition  to  the  clutter, 
the  image  may  include  other  noise  and  man-made  interference 
and  jamming  also,  which  are  all  treated  as  noise.  Only 
targets  are  considered  as  signals. 

3.  Multiple  Stages  "Smart  Sensor"  Signal  Processing 

To  accomplish  the  objectives  of  most  smart  sensor 


systems  in  detecting,  tracking  and  recognizing  very  dim 
targets  deeply  buried  in  noise,  a  multiple  stage  image  pro¬ 
cessing  approach  is  generally  needed  (Table  1.1). 


TABLE  1.1 


IMAGE  PROCESSING  STAGES 


Objective  in 

Various  Stages 

Processing 

Enhancement 

Pre- threshold 

Hard  Limiting 
Adaptive  Filtering 

Detection 

Threshold 

Adaptive  threshold 

Target  Acquisition 

Tracking 

Post- threshold 

Kalman  Tracker 

Recognition 

Target  Recognition 

For  more  detail,  see  Chapter  III.B.2. 

This  thesis  will  concentrate  on  the  development  of 

new  adaptive  filter  techniques  which  will  be  used  in  the 

•  * 

"Enhancement"  stage  to  improve  the  "target  signal"  to 
"background  clutter  noise"  ratio  by  either  suppressing  the 
background  clutter  or  enhancing  the  target  signal,  or  both. 

B.  STATISTICAL  IMAGE  PROCESSING  TECHNIQUES  FOR 
ENHANCEMENT  OF  "TARGET  SIGNAL"  TO  "BACKGROUND 
NOISE"  RATIO  IN  INFRARED  IMAGES 

1 .  Introduction 

Although  the  responsibility  of  detecting  very  dim 
targets  is  shared  by  several  steps  of  image  processing  in 
pre- threshold,  threshold  and  post- threshold  stages,  the  "en 
hancement"  step  before  thresholding  plays  a  very  important 
role  because  it  is  necessary  to  improve  the  "target  signal" 
to  "clutter  noise"  ratio  to  approximately  one  before  a 


threshold  operation  can  be  applied.  Otherwise,  there  will 
be  too  many  false  alarms  collected  by  the  thresholding  step, 
which  makes  post- threshold  signal  processing  difficult. 
Therefore,  in  theoretical  developments  of  new  image  process¬ 
ing  techniques  for  smart  sensors,  a  great  deal  of  attention 
is  given  to  background  clutter  suppression  techniques  for 
enhancement  of  the  signal  to  noise  ratio  before  the  threshold¬ 
ing  step. 

We  have  made  a  survey  of  these  techniques  and  present 
them  in  several  classifications  in  Table  1.2.  First,  they 
are  classified  as  nonadaptive,  open  loop  adaptive  and  closed 
loop  adaptive.  By  "nonadaptive,"  we  refer  to  those  approaches 
whose  filters  are  not  designed  by  using  the  image  character¬ 
istics.  However,  in  two  adaptive  cases,  the  filters  are 
tailor-designed  based  on  the  characteristic  learned  from  the 
images  being  processed.  In  the  open  loop  adaptive  case,  the 
filter  is  not  able  to  update  or  correct  itself  when  the  char¬ 
acteristics  of  the  image  are  changed.  The  image  properties 
must  be  "relearned"  before  a  redesign  of  the  filter  can  be 
made.  In  the  closed  loop  adaptive  case,  a  feedback  process 
is  provided  between  the  filter  output  and  the  input  to  the 
design  process.  In  this  way,  any  change  in  the  image  char¬ 
acteristics  will  result  in  an  increase  of  the  output  error 
which  is  used  to  automatically  update  and  correct  the  filter 
design. 
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TABLE  1.  2 


FOCAL  FLAKE  PROCESSING  TECHNIQUES  FOR  BACKGROUND  CLUTTER  SUPPRESSION 


focal  plane  PROCESSING  ALGORITHMS 

ACTIVE  GROUPS 

SPATIAL 

1st  order,  2nd  order  (Laplaclan) 

4th  order  nonrecursive  spatial 
filter 

MIT  Lincoln* 
Laboratory 

NONAOAPTIVE 

DETERMINISTIC 

TEMPORAL 

Frame  to  frame  di fferencinq: 

(Non recursive  temporal  filter) 

t  1st  and  2nd  differencing 

<  3rd  differencing 

Grumman 

Rockwell 

Hughes 

Three  dimensional  spatial -temporal 
filter  by  variational  metnod 

Rockwell 

TEMPORAL 

Pseudo-reticle  nonrecursive  spatial 
filter  followed  by  recursive  tempo¬ 
ral  bandpass  or  highpass  filter 

Optical 

Science 

spatial- 

spectral 

Nonrecursive  spatial  filter  followed 
by  two  color  d'  .crimination 

MIT  Lincoln* 
Laooratory 

SPATIAL 

Background  normalization 
(Localized  adaptive  threshold) 

General* 

Electric 

DETERMINISTIC 

TEMPORAL 

Bandpass  filter  followed  by  adaptive 
threshold 

Aerojet  * 
ElectroSystems 

2nd.  3rd  order  recursive  temporal 
highpass  filter 

Rockwel 1 

OPEN  LOOP 

ADAPTIVE 

Minimization  of  mean  square  error: 

i  Recursive  Kalman  filter  (spatial) 

Grumman. 

NPGS 

SPATIAL 

1  Nonrecursive  Wiener  filter(soatial ; 

Lockheed 

NPGS 

Maximization  of  signal  to  noise  ratio: 
Nonrecursive  spatial  match  filter 

MIT  Lincoln* 
Laboratory 

NPGS 

Maximization  of  Likelihood  ratio 

Aerospace  Corp 

STATISTICAL 

TEMPORAL 

Minimization  of  mean  square  error: 

j  Nonrecursive  temooral  Wiener  filter 

Lockheed 

NPGS 

1  Recursive  temporal  Kalman  filter 

Maximization  of  signal  to  noise  ratio 

hughes 

NPGS 

SPATIAL- 

Minimization  of  mean  square  error 

NPGS 

TEMPORAL 

Maximization  of  slonal  to  noise  ratio 

NPGS 

SPATIAL 

Minimization  of  mean  square  error: 

Nonrecursive  spatial  filter 

NPGS 

ADAPTIVE 

statistical 

Maximization  of  Si  oral  to  rclse  ritio 

NPGS 

Minimization  of  mean  souare  error 

NPGS 

Maximization  of  signal  to  noise  ratio 

NPGS 

•  Techniques  developed  for  tactical  system 
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These  approaches  are  further  classified  as  determin¬ 


istic  and  statistical.  In  deterministic  cases,  the  filter 
design  is  based  on  non- statistical  properties  of  the  image, 
such  as  its  frequency  characteristics.  In  statistical  cases, 
the  filter  design  is  based  on  statistical  properties  of  the 
image,  such  as  its  autocorrelation  or  power  spectral  density. 

Furthermore,  they  are  classified  according  to  the 
types  of  signal  processing  operations  used:  spatial,  tempo¬ 
ral,  spectral  or  some  of  their  combinations. 

2 .  Open  Loop  Adaptive  Filter 

In  our  research  group,  several  nonrecursive  adaptive 
open  loop  adaptive  filters  have  been  developed.  D.  Bar 
Yehoshua  [4]  first  developed  the  nonrecursive  statistical 
spatial  filters  designed  by  a  minimization  of  mean  squared 
error  criterion  using  theoretically  generated  images  based 
on  both  the  first  and  second  order  Markov  models.  These 
images  are  all  assumed  to  have  zero  mean.  D.  Hilmers  [5] 
extended  these  spatial  filters  to  process  real  world  images 
which  have  non- zero  mean.  Further,  he  extended  the  same  con¬ 
cept  to  nonrecursive  statistical  temporal  filters.  B.  Evenor 
[6]  made  two  additional  extensions.  First,  he  developed  the 
design  procedures  for  spatial  filters  based  on  the  maximiza¬ 
tion  of  signal  to  noise  ratio.  Second,  he  developed  a  closed 
loop  adaptive  spatial  filter  by  extending  the  LMS  (least  mean 
square)  algorithm  used  by  many  one  dimensional  adaptive  filter 
researchers.  It  will  be  discussed  further  in  the  next  section. 
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Using  several  real  world  infrared  test  images,  these 
open  loop  adaptive  filters  have  been  found  to  be  very  effective 
in  suppressing  background  clutter  for  point  targets.  However, 
they  are  not  responsive  to  any  change  in  the  characteristics 
of  the  image  being  processed. 

3 .  Closed  Loop  Adaptive  Filter  and  this  Thesis 

The  realization  of  this  lack  of  true  adaptive  capabil¬ 
ity  led  to  the  study  of  B.  Evenor  [6]  who  developed  the  non¬ 
recursive  closed  loop  adaptive  spatial  filter  based  on  the 
"LMS"  algorithm,  and  tested  this  approach  by  theoretically 
generated  image  using  Markov  models.  However,  it  was  dis¬ 
covered  that  the  LMS  algorithm  is  actually  a  simplified  version 
of  a  more  general  and  powerful  family  of  closed  loop  adaptive 
filters.  It  was  decided  that  the  first  part  of  this  thesis 
would  be  to  develop  such  a  general  adaptive  filter  approach 
which  includes: 

-  Two  optimization  criteria: 

Minimization  of  mean  square  error 
Maximization  of  signal  to  noise  ratio 

-  General  adaptation  equation  using  gradient  search 
models 

-  A  family  of  nonlinear  searching  techniques  to  carry 
out  the  adaptation  process. 

The  details  of  this  theoretical  study  will  be  presented  in 
Chapter  II. 
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C.  IMPLEMENTATION  OF  THE  IMAGE  PROCESSING  PROGRAM 

BY  A  MULTIPLE  MICROCOMPUTER  SYSTEM 

1 .  Introduction 

A  parallel  effort  has  been  made  in  the  investigation 
of  practical  implementation  of  these  statistical  nonadaptive 
image  processing  algorithms  developed  in  our  research  group. 
G.  Hilimitzas  £ 7 J  first  investigated  the  execution  speed  and 
accuracy  of  these  image  processing  algorithms  on  a  main  frame 
computer,  IBM  360/67. 

2 .  Microcomputer  Implementation 

D.  Becker  [8]  investigated  the  performance  of  imple¬ 
mentation  of  the  nonadaptive  image  processing  algorithms  on 
one  16  bit  LSI-11  microcomputer  and  a  combination  of  this 
LSI- 11  microcomputer  and  a  microcomputer  compatible  CDA-MSP-3 
array  processor.  It  was  found  that  using  high  order  language 
programming  and  floating  point  data  format,  today's  microcom¬ 
puter  implementation  is  still  in  its  infancy.  Its  execution 
speed  is  slow  and  not  anywhere  near  any  real  time  processing 
requirements.  Improvements  in  microcomputer  implementation 
by  using  assembly  language  programming,  integer  data  format 
and  improved  programming  on  array  processor  are  currently 
being  developed. 

3.  Multiple  Microcomputer  Implementation  and  this  Thesis 

It  is  obvious  that  to  achieve  real  time  image  proces¬ 
sing  performance  using  microcomputers,  several  improvements 
should  be  considered  simultaneously.  First,  the  processing 
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capability  of  individual  microcomputers  must  be  improved  by 
more  imaginative  programming  and  by  using  attached  special 
processors,  such  as  the  array  processor.  Second,  and  prob¬ 
ably  much  more  important,  is  to  take  advantage  of  the  rapidly 
increasing  number  of  microcomputers  affordable  in  a  system 
by  cleverly  orchestrating  them  into  an  effective  concurrent 
parallel  and  pipeline  execution  of  the  whole  image  processing 
program.  The  advantages  offered  by  the  type  of  multiple  micro¬ 
computer  approaches  do  not  stop  at  faster  execution  only,  but 
also  include  multi-tasking,  higher  reliability  because  of 
better  fault  tolerance.  It  was  decided  that  to  fully  meet 
the  needs  of  new  research  for  the  successful  development  of 
a  smart  sensor,  a  second  part  of  this  thesis  should  address 
the  implementation  issue  of  image  processing  algorithms  by  a 
multiple  microcomputer  system.  Its  details  will  be  presented 
in  Chapter  III. 

D.  SCOPE  AND  EXTENSION  OF  THIS  THESIS 

It  should  be  strongly  emphasized  that  although  this  thesis 
specifically  developed  a  family  of  adaptive  spatial  filters 
for  the  enhancement  of  target  signal  to  noise  ratio  of  images 
and  a  multiple  microcomputer  system  for  the  implementation 
of  the  image  processing,  the  motivation  of  this  thesis  is 
to  contribute  to  the  development  of  smart  sensor  systems. 
Therefore,  the  adaptive  filter  concepts  and  design  techniques 
are  not  limited  to  spatial  filters  only.  They  can 
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be  readily  extended  to  a  wide  class  of  problems  of  poor 
signal  to  noise  ratios.  The  implementation  issue  is  not 
limited  to  adaptive  filter  processing  only.  The  multiple 
microcomputer  system  is  designed  to  implement  not  only  the 
mission  signal  processing  but  also  a  host  of  other  signal/ 
data  processing  tasks  for  management,  command,  control  and 
communication  functions. 
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II.  ADAPTIVE  IMAGE  PROCESSING 


A.  INTRODUCTION 
1.  General 


The  idea  of  an  adaptive  filter  is  inherently  attrac¬ 
tive.  It  does  not  take  any  stretch  of  imagination  to  see  a 
myriad  of  advantages  offered  by  an  adaptive  filter  which  can 
automatically  update  itself  when  it  is  not  performing  accord¬ 
ing  to  an  optimum  criterion.  The  development  of  adaptive 
filters  started  in  the  early  1960’s  when  it  was  extended 
from  the  sampled  data  control  system  [9]  and  when  it  was 
developed  for  adaptive  antenna  applications  [10].  In  ensuing 
years,  a  large  number  of  investigations  were  made  for  appli¬ 
cations  in  antennas  [11],  noise  cancellation  [12]  and  a 
variety  of  filtering  applications  [13-48]. 

It  is  natural  that  adaptive  filter  concepts  are  very 
attractive  for  the  objective  of  this  thesis--to  detect  very 
dim  targets  deeply  buried  in  infrared  background  clutter. 
However,  a  survey  of  adaptive  filter  research  published  in 
the  70' s  reveals  the  following  facts: 

a.  Practically  all  of  the  past  adaptive  filter 
research  dealt  with  one  dimensional  problems. 

b.  LMS  (least  mean  square)  error  has  been  the  most 
widely  used  criterion.  Very  little  attention  has 
been  given  to  other  criteria,  such  as  the  maximi¬ 
zation  of  output  signal  to  noise  ratio  which  is 
probably  better  suited  for  detection  problems. 
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c.  Very  little  attention  has  been  given  to  the  convergence 
speed  issue  of  adaptive  filters. 

Therefore,  we  decided  to  address  these  three  issues 
and  develop  new  adaptive  image  processing  techniques  which 
are  multi-dimensional,  using  either  the  mMSE  (minimization 
of  mean  square  error)  or  the  MSNR  (maximization  of  signal  to 

v 

noise  ratio)  criterion,  and  using  a  family  of  nonlinear  con¬ 
vergence  techniques  developed  in  the  optimization  field  to 
search  for  the  extremum  in  the  adaptive  process. 

However,  the  basic  concept  of  the  adaptive  filter 
and  the  traditional  LMS  approach  will  be  briefly  reviewed 
first  as  a  starting  point  to  introduce  new  techniques  devel¬ 
oped  in  this  thesis. 

2 .  Basic  Concepts  of  Adaptive  Filters 

The  basic  concepts  of  an  adaptive  filter  can  be 
described  concisely  as  follows: 

The  filter  is  represented  by  a  vector  H.  In  an 
adaptive  filter,  H  is  updated  in  successive  iteration  steps 
described  by  a  subscript  as  H^,  ^  correction  term, 

AHk,  is  generated  in  each  iteration  step  such  that 


— K+l 


— K  +  — K 


(1.0) 


The  iteration  steps  are  carried  out  to  optimize  a  selected 
performance  function  until  the  filter  converges  to  its  steady 
state  which  also  corresponds  to  the  reaching  of  an  extremum 
of  the  performance  function  surface. 
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The  filter  H  could  be  a  temporal  filter  or  a  spatial 
filter.  It  could  be  a  recursive  filter,  also  called  infinite 
impulse  response  (IIR)  and  zero/pole  filter,  or  a  nonrecur¬ 
sive  filter,  also  called  finite  impulse  response  (FIR),  and 
all  zero  filter. 

The  performance  function  could  be  the  mean  square 
error,  or  the  output  signal  to  noise  ratio,  or  other  func¬ 
tions  such  as  the  li-kelihood  ratio.  The  optimization  objec¬ 
tive  could  be  either  the  minimization  or  maximization. 

In  this  thesis,  two  dimensional  spatial  filters  are 
considered.  They  are  the  nonrecursive  type.  Two  types  of 
cost  functions  are  used.  Their  optimization  objectives  are 
shown  in  the  following  table. 


TABLE  II. 0 
OBJECTIVE  FUNCTIONS 


Adaptive  filter 

Performance  Function 

Optimization  Goal 

mMSE 

Mean  Square  Error 

Minimization 

MSNR 

Output  Signal  to 
Noise  Ratio 

Maximization 

Let  us  consider  a  nonrecursive  spatial  filter  of  a  filter 
area  of  3  by  3  pixels  which  has  nine  filter  coefficients. 

The  cost  function  is  a  surface  in  a  nine  dimensional  space. 

The  goal  of  the  iterative  adaptation  procedure  is  to  search 
for  the  coordinates  (filter  coefficient  space)  for  the  extreme 
point  (either  a  minimum  or  a  maximum)  of  the  performance 
function  surface. 
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3.  Traditional  Approach  -  LMS  Algorithm 

An  overwhelmingly  large  portion  of  the  past  adaptive 
filter  studies  followed  the  approach  originated  by  Professor 
B.  Widrow  [14] ,  and  commonly  known  as  the  LMS  (least  mean 
square)  algorithm. 

The  performance  function  used  in  this  approach  is 
the  "mean  square  error."  The  optimization  goal  is  "minimiza¬ 
tion."  Prof.  Widrow  proposed  that  the  adaptation  term  AH  be 
expressed  as: 

AH  *  2yeX 

where  X  =  signal  being  processed 

2y  =  a  constant,  called  adaptive  gain 

T 

e  *  adaptation  error  *  d  -  H  X 
d  =  reference  (or  desired  signal) 

H  *  filter  coefficient  vector. 

The  adaptation  equation  is  then 

«K+1  *  — K  + 

A  steepest  descent  search  technique  is  then  used  for  per¬ 
forming  the  adaptation  steps. 

Although  this  traditional  LMS  approach  has  been  used 
by  most  of  the  adaptive  filter  researchers,  it  is  not  without 
certain  drawbacks  which  will  be  briefly  described  as  follows. 

The  adaptation  equation  used  in  the  traditional  ap¬ 
proach  can  be  considered  as  a  special  case  of  a  more  general 
adaptation  equation, 
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H,  ,  5  Hk  +  aK  — K 

an  equation  commonly  used  in  the  field  of  optimization. 

The  term  is  sometimes  called  the  "gradient"  meaning  the 
gradient  of  the  performance  function  surface.  The  term 
is  sometimes  called  the  "step  size"  meaning  the  displacement 
in  the  vector  space  H.  The  optimization  procedure  at  itera¬ 
tion  step  K+l  gave  a  filter  vector  H^+1  which  is  closer  to 
the  optimal  vector  H*  than  previous  filter  vectors.  There¬ 
fore,  Prof.  Widrow's  imaginative  proposal  can  be  interpreted 
as  the  following  two  assumptions: 

«“*■  2 eX 

aK  u  31  a  constant* 

These  two  bold  assumptions  probably  have  resulted  in  several 
inherent  limitations. 

a.  Because  the  gradient  £K  is  not  tailored  to  the 
performance  function,  convergence  could  be  slow.  Further,  the 
steady  state  filter  result  may  not  yield  the  best  estimation. 
Possibly,  a  steady  state  misadjustment  could  exist  [24]. 

b.  Because  the  step  size  is  assumed  to  be  a 
constant,  the  adaptation  procedure  may  never  reach  a  steady 
state. 

4.  This  Thesis  Research 

In  view  of  the  results  of  the  survey  and  review  of 
the  status  of  the  adaptive  filter  approach  as  presented  above, 
we  identified  a  series  of  research  problems  which  must  be 
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investigated  in  order  to  develop  adaptive  image  processing 
techniques  for  suppressing  background  clutter  in  infrared 
images  and  for  helping  the  detection  of  dim  targets. 

First,  we  must  extend  the  one  dimensional  adaptive 
filter  techniques  based  on  the  mMSE  criterion  to  two  dimen¬ 
sions  . 

Second,  we  should  develop  an  adaptive  filter  based 
on  the  MSNR  criterion  which  is  presented  in  section  B. 

Third,  we  should  develop  a  new  adaptive  equation 
which  is  more  responsive  to  the  performance  function  in  order 
to  improve  convergence  speed  and  to  minimize  steady  state 
misadjustment.  In  other  words,  the  adaptive  equation  is  in 
the  form  of 

— K+l  ‘  Sk  *  aK— K 

The  step  size  aR  and  gradient  will  not  take  the  form  of 
2y  and  eX  as  is  customarily  done  in  practically  all  of  the 
past  adaptive  filter  studies  based  on  the  LMS  algorithm. 

Fourth,  we  will  investigate  a  variety  of  non-linear 
gradient  techniques  to  search  for  the  minimum  in  the  case 
of  mMSE  filter  and  the  maximum  in  the  case  of  MSNR  filter. 

They  are  derived  and  presented  in  sections  C  and  D,  respec¬ 
tively. 

The  results  of  applying  these  adaptive  spatial  filters 
to  two  real  infrared  images  will  be  presented  in  section  F. 


B.  DERIVATION  OF  OPTIMIZATION  CRITERIA 


1 .  Performance  Function  I  -  mMSE 

The  performance  function  based  on  the  mMSE  criteria 
is  derived  along  with  the  nonrecursive  spatial/temporal  filter. 
The  nonrecursive  spatial  and  temporal  filters  are  described 
by  a  set  of  filter  coefficients,  vector  H  over  the  area  of  a 
"search-box"*.  The  observed  signal  in  the  "ith"  "search-box" 
is  represented  by  the  signal  vector  X^.  The  estimated  target 

A 

intensity  within  the  search-box  is  obtained  by  the  linear 
filter 


/\  rr 

Si  -  ht  Xi 


(2.00) 


This  process  is  carried  out  throughout  the  whole  image. 

The  nonrecursive  filter  is  represented  by  the  vector' 


HT  *  [  H(l) ,  H(2) , . . .  ,  H(N)  ] 


(2.01) 


where  N  is  the  number  of  pixels  in  the  filter  "search-box". 
The  image  signal  within  the  "ith"  filter  "search-box"  is 
described  by  the  vector: 

XiT  -  [X^l),  x±  (2) . Xi(N)J  (2.02) 


Throughout  this  thesis,  matrices  will  be  denoted  by  a 
under  the  symbol.  Vectors  will  be  denoted  by  a  "_"  under 
the  symbol. 

The  estimation  error  is  defined  as: 

*  See  Fig.  2.0 
2 

T  denotes  the  transpose  of  the  vector. 
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(2.03) 


£i  M  Si  •  Si 


where  is  the  signal  and  the  estimated  signal  in  the 


'search-box" . 


The  mMSE  (minimization  of  mean  square  error)  per¬ 


formance  function  is  defined  as: 


J  4  E[£i  •  e?  ) 


(2.04) 


where  E[»]  denotes  the  expected  value.  Substitution  of 
(2.00)  and  (2.03)  into  (2.04)  gives: 

J  =  E[(HTX.  -  Si)(HTXi  -  Si)T] 


=  E  [HTXiX  7  H  -  2HTXiSi  +  S?  ] 


(2.05) 


Since  the  filter  value  is  fixed  for  an  image,  it  can  be 
moved  out  of  the  expectation  operation  to  give: 

J  =  RrE[£i£iT  ]H  -  2.HT.E[XiSi]  +  E[S±2  J  (2.06) 
In  order  to  simplify  (2.06),  the  following  terms  are 

defined: 

(1)  The  autocorrelation  matrix  RYY  of  the  observed  image 
is: 


«XX  4  EISA1  1 


(2.07) 


Being  a  correlation  matrix,  it  is  a  symmetric  and  positive 
definite  matrix. 

(2)  The  cross  correlation  vector  between  the  observed 
signal  and  the  target  signal  of  interest  is: 


— XS  *  El*isi3 

(3)  The  mean  square  value  of  the  target  signal  is: 
d  4  EISj2  ] 


(2.08) 


(2.09) 


I 


Substitution  of  (2.07)  through  (2.09)  into  (2.06)  gives: 

J  3  -T£xx-  '  2-T^XS  +  d  (2.10) 

Equation  (2.10)  is  the  performance  function  of  the  mMSE 
criteria.  It  is  a  quadratic  function  in  terms  of  the  filter 


Theorem  2.01 

The  performance  function  (2.10)  is  a  unimodal  (i.e., 
has  a  single  minimum)  function  if  the  autocorrelation  matrix 
RXX  *s  Positive  definite. 

Proof 

The  stationary  points  of  the  function  (2.10)  are 
found  by  setting  the  gradient  of  (2.10)  with  respect  to  H 
to  zero. 
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(2.12) 


Equation  (2.12)  is  the  optimum  filter  vector  which  minimizes 
the  cost  function  (2.10),  In  order  to  prove  that  the  cost 
function  is  minimized  for  H* ,  the  second  gradient  of  (2.10) 
with  respect  to  H  is  taken. 

VHC7HJ)  =  ?XX  (2.13) 


Since  Rxx  is  positive  definite,  the  cost  function  is  mini¬ 
mized.  The  minimum  value  is 


mm 


d 


(2.14) 


It  is  obtained  by  substituting  the  optimum  filter  vector 
H*  to  (2.10). 

The  second  derivative  of  the  cost  function  I,  as 
described  in  (2.13),  is  called  the  Hessian  matrix. 

If  the  autocorrelation  matrix  is  singular,  the  cost 
function  (2.10)  is  no  longer  unimodal  because  (2.11)  can  be 
set  to  zero  for  an  infinite  number  of  filter  vectors  H. 

It  can  be  shown  [49]  that  for  such  a  case,  a  minimal 
solution  can  be  obtained  [50,  51]  by  using  the  pseudo  inverse 
of  Rxx. 

-*  "  (~XX  -XX^  *  ~XX  *  -XS 


The  solution  is  hot  unique. 


2.  Performance  Function  II  -  MSNR 


The  observed  signal  in  the  "search-box"  is  repre¬ 
sented  by  the  vector  X.  Let  us  assume  that  the  target  signal 
vector  S  and  the  clutter  noise  vector  N  are  additive: 


X  =  S  +  N 


(2.15) 


Applying  the  linear  filter  H  to  the  input  signal  vector  X, 
we  obtain: 


HTX  =  HT(S  +  N) 

T  T 
=  H1  S  +  H*N 


(2.16) 


Let  us  define  the  following  terms: 

SQ  A  S  =  target  signal  after  filtering  (2.17) 

NQ  “  HT N  =  clutter  noise  after  filtering  (2.18) 

The  output  signal  to  clutter  noise  ratio  is  then  defined  as: 


J  A 


T 

The  Power  in  the  filter  image  H  X  due  to  target  signal 

T 

The  Power  in  the  filter  image  H  X  due  to  clutter  noise 


(2.19) 


j  =  - ”  (2.20) 

H[N02) 

Where  E [ - ]  denotes  the  expected  value,  substitution  of 
(2.17)  and  (2.18)  into  (2.20)  gives: 

E[(HTS)2]  E[HTSSTH] 

J  =  - =  -  (2-21) 

E[CHTN)  ]  E[HTNNTH] 


The  filter  vector  H  can  be  taken  out  of  the  expectation 

« 

operation. 
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J 


(2.22) 


HTE[SST]H 
HTE[NNT]H 

Let  us  define  the  signal  autocorrelation  matrix  as: 

Rss  4  E[SST]  (2.23) 

and  the  clutter  noise  autocorrelation  matrix  as: 

RNN  4  £[MT]  (2.24) 

Rnn  and  Rgg  are  symmetric  and  positive  definite.  Substitution 
of  (2.23)  and  (2.24)  in  (2.22)  yields: 

J 

The  performance  function  J  in  (2.25)  is  the  performance 
function  of  the  MSNR  criteria. 

The  filter  vector  H  is  obtained  by  maximizing  J  in 
(2.25)  with  respect  to  the  filter  vector  H. 

Theorem  2.02 

The  maximum  of  the  objective  function  (2.25)  is 
equal  to  the  largest  eigenvalue  of  the  matrix  •  j^gg»  and 
the  optimum  filter  H*  is  the  corresponding  eigenvector. 

Proof 

The  proof  is  based  on  the  Cauchy- Schwarz  inequality 

by  finding  the  upper  bound  of  J. 

Since  the  autocorrelation  matrix  R^N  is  symmetric 

and  positive  definite,  there  exists  a  square  nonsingular 

matrix  V  which  satisfies  the  relation  [52]. 
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«t£ssH 

sW 


(2.25) 


function  J  reaches  its  maximum  when  the  equality  holds, 
which  occurs  when; 

W  =  a  •  PW  (2.35) 

where  a  is  a  constant*  Substituting. (2. 29)  and  (2.32)  into 
(2.35)  obtains  (2.36). 


~ NN —  *  a  ?SS  *  —  (2.39) 

Since  RNN  is  a  positive  definite  matrix,  its  inverse  R^1 
exists.  Multiplying  (2.39)  by  i  •  R^jjj  ,  we  obtain: 

(Snn  •  5ss  '  ;  •  '  a*  ’  ® 

where  I  is  the  identity  matrix. 

Equation  (2.40)  is  called  the  generalized  eigenvalue 
eigenvector  problem  [52]. 

Substituting  the  H*  of  (2.40)  into  (2.25),  we  obtain 
the  maximum  value  of  J.  One  can  see  that 
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J  =('£') 
max  v  a  •'max 


(2.41) 


In  other  words,  *Jmax  is  the  largest  eigenvalue  of  the  matrix 
RNN  RSS’  an<*  — *  *s  t*ie  corresponding  eigenvector.  The  noise 
correlation  matrix  RN^  can  be  obtained  by  assuming  some  target 
signal  of  interest  S  and  using  the  observed  signal  X  in  the 
following  way  (the  signal  and  noise  are  assumed  additive). 


?NN  *  E[  (X  -  S)  (X  -  S)  ] 


(Q.E.D.) 


Theorem  2.03 


The  performance  function  J  in  (2.25)  is  in  general  a 


multi-modal  function. 


Proof 

Based  on  theorem  (2.02),  the  stationary  points  of  the 
performance  function  J  satisfy  the  eigenvector  equation  (2.40) 

t?NN  Jss  "  a  V  a*  -  0 

In  general,  this  equation  has  n  different  solutions,  because 
the  matrix  RN^  Rgg  in  general  can  have  n  distinct  eigenvalues, 
and  thus  n  corresponding  eigenvectors.  So,  in  general,  the 
performance  function  can  have  one  absolute  maximum  and  n-1 
local  smaller  maxima. 


Theorem  2 . 04 

The  performance  function  J  is  a  unimodal  (has  a  single 
maximum)  if  the  matrices  Rgg  and  R^N  are  defined  as  in  (2.23) 
and  (2.24). 

Proof 

The  proof  is  based  on  the  fact  that  Rgg  is  a  dyad. 

Use  equation  (2.40): 
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^NN  $SS  a  *  ^  -  "  0 

The  matrix  Rgg,  being  a  dyad,  can  be  written  as: 

Sss  ■  £  •  IT 

where  r  is  a  vector. 


(2.42) 


As  mentioned  before  for  the  nontrivial  solution  of 


(2.40),  the  performance  function 


j  =  i 

a 


(2.43) 


Using  (2.42)  and  (2.43)  in  (2.40),  we  obtain 

•  I  rT  •  -  JH* 


(2.44) 


‘Separating  (2.44)  into  a  product  of  a  vector  and  a  constant. 


we  obtain: 


CRNil)(£TH*)  =  J-H* 


(2.45) 


For  generality,  a  constant  g  can  be  used  in  the  left  side 


of  (2.45)  to  give: 

ts  •  -  J-H* 

Comparing  both  sides  of  (2.46),  we  get: 


Vx  ■  \  •  iT»* 


"  6  ‘  Enn  '  I 


(2.46) 


(2.47) 


(2.48) 


Equation  (2.47)  shows  that  if  £gg  is  a  dyad,  the 
performance  function  J  has  a  unique  stationary  point 


I 


where  it  reaches  its  maximum.  The  general  eigenvalue  problem 


has  a  single  non  zero  eigenvalue  J 
eigenvector:  H*  3  3  •  *  r. 


max' 


and  a  corresponding 

(Q.E.D.) 


C.  DERIVATION  OF  SEARCHING  TECHNIQUES  FOR  EXTREMUM: 

GRADIENT  SEARCH  METHODS  FOR  THE  MINIMUM  OF  THE  mMSE 
PERFORMANCE  FUNCTIONS 

1 .  Steepest  Descent  Method  (SD)  and 
the  Best  Step  Adaptation  Gain 

The  steepest  descent  method  is  a  gradient  method 
which  uses  the  Jacobian  gradient  (G  *  V^J)  of  the  performance 
function  J  to  determine  a  suitable  direction  of  search.  Grad¬ 
ient  methods  which  use  the  Jacobian  to  determine  the  direction 
search  are  called  first  order  methods.  Gradient  methods  for 
optimization  are  based  on  the  Taylor  expansion  of  the  per¬ 
formance  function  J,  as  given  below: 

J(H  +  AH)  a  J  (H)  +  GT-AH  +  7AHTA.aH  (2.49) 

where  G  is  the  Jacobian  gradient  of  J  and  A  is  the  matrix  of 
second  order  partial  derivatives  called  the  Hessian  matrix. 
Equation  (2.49)  can  be  written  in  the  form: 

J(H+  AH)  a  J (H)  +  AJ  (2.50) 

The  steepest  descent  uses  only  the  Jacobian,  so 

AJ  *  GT-AH  (2.51) 

In  order  to  minimize  the  performance  function  J,  we  want  to 
generate  a  descending  sequence  of  J  which  finally  converges 


to  the  minimum  of  J,  J*.  In  other  words,  we  want  a  negative 
A J,  but: 


AJ  -  ||  G  ||  - 1|  AH  ||  -  cos  4 

where  <j>  is  the  angle  between  the  two  vectors,  G  and  AH. 

For  maximum  reduction  of  the  cost  function  J,  $  =  tt  (2.52) 
From  (2.52),  it  is  obvious  that  the  change  AH  in 
the  filter  vector  H  should  be  in  the  direction  of  the  nega¬ 
tive  gradient  -  G.  This  direction  is  called  the  steepest 
descent  direction. 

The  steepest  descent  step  AH  can  be  written  in  the 

form: 

AH  =  -  a • G  (2.53) 

where  -G  is  called  the  step  direction  gradient  and  a  the 
step  size.  In  adaptive  filter  terminology,  a  is  called  the 
adaptation  gain. 

In  order  to  generate  an  iterative  method,  one  can 
represent  the  filter  vector  H  +  aH  as  HK  +  1  and  H  as  H^. 

Thus , 

HK+1  =  hK  +  (2.54) 

Substituting  (2.53)  in  (2.54),  we  obtain: 

— K+l  -K  "  aK  '  -K  (2.55) 

Equation  (2.55)  is  called  the  steepest  descent  iterative 
method.  For  simplicity  and  without  losing  generality,  the 
negative  sign  will  be  included  in  aK<  Thus  (2.55)  becomes: 


— K+l  *  -K  +  aK  -K  (2.56) 

If  very  small  values  of  (ct^)  are  selected,  the  sequence  {H^} 
will  converge  very  slowly.  In  order  to  increase  the  speed 
of  convergence  substantially,  we  chose  the  step  sizes  which 
provide  the  biggest  descent  each  step.  This  concept  is 
called  the  "best  step".  The  adaptation  gain  aR  is  picked  to 
minimize  J(H^+1).  This  choice  of  ct^  constitutes  a  one  dimen¬ 
sional  minimization  of  the  performance  function  J(HK+1). 

Lemma  2.05 

Let  J(H  )  be  the  performance  function  to  be  minimized. 
Let  the  filter  vector  HK+1  be  updated  by  the  steepest  descent 
method  (2.56),  then  the  "best  step"  towards  the  minimum  of 
J  is  obtained  in  every  iteration  if  the  adaptation  gain 
satisfies  the  relationship: 

gJj-Gh-0  C2.S7) 

where  is  the  Jacobian  gradient  of  J  with  respect  to  the 
filter  vector  H. 

Proof 

The  performance  function  ^(Mk+1^  can  exPressed  as: 
J(HK+i)  -  J(Hk  +  aK  Gk)  (2.58) 

The  task  is  to  find  aK  which  minimizes  J(Hk+i)  by  setting 
the  derivative  of  with  respect  to  to  zero. 
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(2. 59) 


da  K 


but  H^+1  is  a  function  of  Oj,  as  shown  in  (2.56).  Thus 
(2.59)  becomes: 


|_tJCHK+in  -  '  0 


(2 . 60) 


Since  G^,  =  Vu  J  and  4— ^--K+l^  =  Gv,  (2.60)  becomes: 
-K+i  Hk+1  daK  -K 


0 


(2.61) 


From  (2.61),  the  best  step  concept  requires  orthogonality 
between  the  two  gradient  vectors,  GK+1  and  G^. 

(Q.E.D.) 


Up  to  this  point,  the  cost  function  J  was  not  speci¬ 
fied,  and  the  derivation  of  the  steepest  descent  was  made 
for  any  continuous  differentiable  function. 

The  mMSE  performance  function  as  given  by  [2.10] 
can  be  written  as: 


jk  *  — k  Sxx«k  -  2»K 


-  2H„  Rvc  +  d 


XS 


(2.62) 


The  gradient  G^  of  with  respect  to  HK  is  given  by: 

S*  4  vhxjk  ’  ^xxSk ‘Sxs>  (2'63) 


From  (2.63)  and  (2.56),  GK+1  can  be  expressed  as: 
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2(Ryy 

H.V4.1  -  Rvc) 


(2.64) 


48 


[Step  2]  Compute  the  gradient: 


GK  *  2^XX  -K  '  -XS-1 


[Step  3]  Compute  the  adaptation  gain: 


lK 


1 

I  • 


-t  $XX  -K 


[Step  4]  Update  the  filter  vector: 

— K+l  -K  +  aK  ‘  -K 

[Step  5]  Test  for  stopping  condition: 

T 

If  G„  G„  <  e,  then  terminate.  Otherwise, 

— K  — K 

go  to  step  2. 

T 

The  stopping  criteria  is  chosen  as  GK  GK  <_  e  because 
the  performance  function  is  unimodal  [has  a  single  stationary 
point),  and  we  are  looking  for  the  stationary  point  which  in 
fact  satisfies  the  vanishing  of  the  gradient. 

2 .  Accelerated  Steepest  Descent  Method  (ASP) 

The  accelerated  steepest  descent  method  was  first 
introduced  in  1964  by  Shah,  Buehler  and  Kempthose  [  53  ] 

Its  purpose  was  to  accelerate  the  convergence  of  the  standard 
steepest  descent  method.  Its  concept  was  incorporated  in  an 
algorithm  which  converges  to  the  minimum  of  any  n  dimensional 
quadratic  function  in  no  more  than  2»n-l  steps.  Practically, 
this  algorithm  is  not  very  efficient  because  of  its  sensi¬ 
tivity  to  error  propagation.  For  large  n,  the  error  propa¬ 
gation  affected  the  convergence  rate  and  the  method  sometimes 
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converges  as  slowly  or  even  more  slowly  than  the  steepest 
descent  method. 


The  adaptation  gain  of  the  ASD  method  is  computed 
using  Lemma  2.05  and  the  fact  that  the  adaptive  filter 
is  updated  by  the  iterative  equation: 


—K+l  *  — K  +  aK  *  — K  (2-67> 

From  Lemma  (2.05)  and  (2.67), 

-kJi  *  VK  =  0  (2.68) 

but  -K+l  =  2^XX  -K+l  ’  -XS^  tsee 


2^fa(-K  +  aK  -K-*  '  -XS^ 

2  (fa  -K  ’  -XS-1  +  2aK  fa~K 


=  -K  +  2aK  fa  - 


K 


(2.69) 


Using  (2.69)  and  (2.68),  we  obtain: 


(£K  +  2ctK  fa  — K^  •  — K 


(2.70) 


i  •  — K  — K 
Z  — T 


— K  fa  -K  (2.71) 

The  accelerated  steepest  descent  adaptive  filter  is 
carried  out  by  the  following  steps: 
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[Step  1]  Set  a  starting  filter  vector,  HQ  *  H ^ ,  stopping 
bound  e,  the  correlation  matrix  Ft^  and  the  cross 
correlation  vector  R^g,  and  the  gradient  Gq. 

[Step  2]  Compute  the  gradient  of  J. 

— K  “  2(~XX  -K  '  -XS^ 


[Step  3]  Compute  the  step  direction  vector  V^. 

-  G„  for  K  =  2,  4,  6 

**K  “k-2  for  K  =  3,  S,  7 


(  *  — K 

"K  l  -  H. 


[Step  4]  Compute  the  adaptation  gain  a^. 


aK  ‘  7  *  7T 


G  ^  V 
-K  -K 


-K  ~XX  -K 


[Step  5]  Update  the  filter  vector 


— K+l  =  — K  +  aK  -K 


[Step  6]  Test  for  stopping  condition. 

T 

If  G^  •  G^  <.  e ,  terminate.  Otherwise  go  to 
step  2. 


3.  Amir’s  Method  (AMM) 

This  method  was  suggested  by  this  author  at  the 


beginning  of  the  research.  The  purpose  was  to  derive  a  method 
which  will  converge  faster  than  the  steepest  descent  method. 
Experiments  showed  that  the  AMM  method  converges  approximately 


three  times  faster  than  the  SD  method  as  shown  in  Fig.  2.6a. 
This  method  is  a  non-conjugate  gradient  method  and  is  not  as 
fast  as  the  conjugate  gradient  methods.  But  it  can  replace 
the  SD  method  as  a  robust  and  faster  method. 

The  AMM  gradient  search  method  was  designed  based 
on  the  fact  that  the  gradient  of  a  unimodal  performance  func¬ 
tion  vanishes  only  once,  at  the  stationary  point  of  the  per¬ 
formance  function,  which  is  the  extremum  point  we  are  looking 


The  adaptation  procedure  is  derived  in  the  following. 
The  functional  ^  is  defined  as: 

=  -K  -K  (2.71-1) 


where  is  the  gradient  of  the  performance  function  J,  as 
in  (2.63). 

The  adaptive  filter  is  updated  as  given  in 

(2.56)  for  the  SD  method.  The  adaptation  gain  is  computed 
according  to  the  "best  step"  concept,  to  minimize 
Using  (2.64),  we  obtain: 


¥K+1  *  — K+l  — K+l 


(2.71-2) 


K  +  aK  £XX  — *  C-K  +  aK  £XX  -K5 
-K  +  aK  ^XX5  C l  +  aK  JxX^-K 


^K+l  *  -K  +  2ctK  J$XX  +  °K  h^-K 


(2.71-3) 


In  order  to  get  the  best  step,  we  take  the  derivative  of 
^K+l  respect  to  the  adaptation  gain  aK  and  set  it  to 


zero : 


a“/K+1"  ~ *  *•*  2&X  +  2aK?XX;)-K  =  0 
=  2-K  *XX  -K  +  2aK  -K  &XX  -K 


(2.71-4) 


Solve  (2.71-4)  and  get: 


a 


K 


-K  £xx  -K 

-K  £xx  -K 


(2.71-5) 


The  AMM  adaptive  filter  is  implemented  by  the 
following  steps: 

[Step  1]  Set  initial  filter  vector  the  stopping 

bound  e,  the  correlation  matrix  R^x  and  the  cross - 


correlation  vector  RXg. 


[Step  2]  Compute  the  gradient  £K  of  the  performance  function 

J. 

— K  =  2  ()$XX  -K  '  -XS-1 


[Step  3]  Compute  the  adaptation  gain  a^: 
-  -  -k1_xx  — K 


-K  *XX  -K 


[Step  4]  Update  the  filter  vector  Hj, 


— K+l  -K  +  aK  -K 
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[Step  5]  Test  for  stopping  condition. 

If  H* K  <  e,  then  terminate,  otherwise  go  to  step  2. 

4 .  Fletcher-Reeve  Conjugate  Gradient  Method  (CGF) 

The  Fletcher-Reeves  conjugate  gradient  (CGF)  was 
first  introduced  in  1964  by  Fletcher  and  Reeves  [69].  The 
method  is  similar  to  the  pioneering  work  of  Hestenes  and  Stiefel 
[54],  The  CGF  method  uses  conjugate  vectors  as  step  direc¬ 
tion. 


Definition 


The  vectors  V^,  Vj  are  said  to  be  "conjugate"  with 
respect  to  the  matrix  R^  if  they  satisfy  the  following 


condition: 


-i  ~XX  -j  "  0  for  it)  and  V^,  Vj  +  0. 

The  importance  of  this  method  is  its  fast  convergence  rate 
for  quadratic  functions  like  (Eq.  2.10).  This  method  is 
proved  to  converge  in  n  steps  apart  from  rounding  errors 
where  n  is  the  dimension  of  the  filter  vector. 

The  adaptation  gain  of  the  CGF  method  is  computed 
using  Lemma  2.05  and  the  fact  that  the  adaptive  filter  H.K+1 
is  updated  by  the  iterative  equation  in  (2.67).  Following 
the  equations  (2.67)  up  to  (2.71)  in  a  similar  way,  we 
obtain  the  adaptation  gain  as: 


G  T  v 

-K  -K 


V  R  V 
-K  ~XX  -K 


(2.71-6) 
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The  step  direction  vector  is  computed  by  the  following 
iterative  procedure  [55] . 


— K+l  =  '  — K+l  +  ^K  *  — K 


(2.73) 


,  _  -K+l  -K+l 

K  ~TT~~r - 

— K  — K 


(2.74) 


The  method  of  CGF  was  once  applied  to  the  Rosenbrock 
function  [54].  The  performance  result  was  poor.  Subsequently, 
it  was  suggested  to  restart  the  method  every  n  iterations, 
where  n  is  the  dimension  of  the  vector  H.  This  thesis  con¬ 
firmed  that  the  convergence  of  this  method  for  our  two  per¬ 
formance  functions  (2.10)  and  (2.25)  is  faster  if  this  method 
of  restarts  is  used. 

The  CGF  adaptive  filter  is  carried  out  in  the  follow¬ 
ing  steps: 

[Step  1]  Select  a  starting  filter  vector  HQ,  the  stopping 
bound  e,  the  auto-correlation  matrix  Rxx  and  the  cross¬ 
correlation  vector  Rjjg- 

[Step  2]  Compute  the  gradient  G^  of  the  performance  func¬ 
tion  J. 

-K  “  VJ*  *  2^~XX  -K  "  ~XS^ 


[Step  3]  Compute  the  step  direction  vector  Vx 

•  -  G„  if  K  Mod  n  *  0 

W 


C  £k  if 
-K  |  G  J  G 


-K+l  — K- 1 


Ik-i 


else, 
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[Step  4]  Compute  the  adaptation  gain: 

.1  Sk  -k 


— K  ?XX  -K 


[Step  5]  Update  the  filter  vector  Hjr+1- 


— K+l  *  -K  +  aK  — K 


[Step  6]  Test  for  stopping  condition. 

T 

If  £  e,  terminate  the  adaptation.  Otherwise  go  to 

step  2. 


Pollack- Rebiere  Conjugate  Gradient  Method  (CGP1 


The  Pollack- Rebiere  conjugate  gradient  CCGP  method 
is  similar  to  the  CGF  method.  The  difference  is  in  the  com¬ 
putation  of  the  search  direction  when  K  Mod  n  i  0.  In  [56] , 
Powell  gave  a  theoretical  reason  for  favoring  the  Pollack-Rebiere 
algorithm.  In  this  thesis,  the  author  found  the  CGP  method 
more  efficient  and  converging  faster  than  the  CGF  method.  (See 
Section  F) . 

The  search  direction  of  the  CGP  method  is  given  by 
the  following  expression: 


G*  (Gk  -  G  ,) 

— K  *  ~  — K  +  ~T  \  *  1k-1 

£k-1  — K- 1 


(2.75) 


The  CGP  adaptive  filter  is  carried  out  in  the  following  steps: 


[Step  1]  Select  a  starting  filter  vector  H0>  the  stopping 
bound  e,  the  auto-correlation  matrix  R^  and  the  cross¬ 
correlation  vector  RX£. 
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[Step  2]  Compute  the  gradient  of  the  performance  function 
J. 

GK  *  VJ  *  2CRXX  Hk  -  Rxs) 


[Step  3]  Compute  the  step  direction  vector  V^. 

G„  if  K  Mod  n  =  0 


— K 


!  vj  x  x  iv.  nuu  ii  —  u 

„  .  —Y  K  ‘  — K- 1^ 

•  — K  +  -TT— - - 


fkFl  *  — K- 1 


-K-l  else 


[Step  4)  Compute  the  adaptation  gain. 

i  SkT-vk 


7  -K  hx  -K 


[Step  5]  Update  the  filter  vector 


— K+l  -K  +  aK  *  -K 

[Step  6J  Test  for  stopping  condition. 

T 

If  Gk  Gr  <.  e,  terminate  the  adaptation.  Otherwise  go 
to  step  2. 


6 .  Davidon  -Fletcher-Powell  Variable  Metric  Method  (DFP) 
One  of  the  most  efficient  searching  methods  is  the 
Davidon -  Fletcher-Powell  (DFP) .  It  was  developed  by  Fletcher 
and  Powell  [  57  J  from  the  variable  metric  method  due  to 
Davidon  [54,58],  The  variable  metric  term  was  coined  by 
Davidon  to  describe  methods  which  at  the  K  iteration  utilize 
the  increment  of  the  form 
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AH  *  '  ot^  Ajj  (2.76) 

and  update  the  metric-correction  transformation  kv  from 
iteration  to  iteration.  The  DFP  method  updates  the  metric 
Ar  by  the  iterative  expression: 


~K+1 


AK  +  al 


Ik  Ik 

- m - 

lx  Ik 


6k|kIk  ak 

-K  ~K  -K 


where : 


AK 


(2.77) 


(2.78) 


(2.79) 


Fletcher  and  Powell  proved  that  for  a  general 
function  J  that  a  positive  definite  AK  implies  AR+1  is  also 
positive  definite  [  58  ] .  For  the  performance  function  J 
given  in  (2.10),  it  can  be  shown  (  59  J  that  the  set 
(“K  AK  *  — k ^  *s  a  set  conJu8ate  directions  so  the  DFP 
exhibits  quadratic  termination  in  n  steps. 

The  adaptation  gain  of  the  DFP  adaptive  filter  based 
on  the  best  step  concept  introduced  in  (Lemma  2.05). 

Using  the  filter  update  of  the  DFP  method: 


Ik+i  Ik  +  aK  Ik 


The  adaptation  gain  is  found  to  be; 


a. 


1 

7  * 


it  h 


Ik  £xx!k 


(2.80) 


\ 
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The  adaptive  filter  designed  by  the  DFP  method  is 
carried  out  in  the  following  steps: 

[Step  1]  Select  a  starting  filter  vector  the  starting 
correction  metric  AQ  =  I  (where  I  is  the  identity  matrix, 
the  gradient  the  stopping  bound  e,  the  autocorrelation 
matrix  and  the  crosscorrelation  vector  R^g. 

(Step  2]  Compute  the  step  direction  vector  VK: 


— K 


[Step  3]  Compute  the  adaptation  gain. 


i  Ik  — k 


a 


K 


7  * — T~ 

L  V „  R 


-K  IXX  -K 


[Step  4]  Update  the  filter  vector  HK- 


u  »  H  +  a  V 
—K+l  -K  K  -K 


[Step  5]  Compute  the  gradient  GK+1  of  the  function  J. 

-K+l  =  2LRXX  -K+l  ~  -XS^ 

[Step  6J  Compute  the  vector  P^. 

-K  *  -K+l  '  -K 

[Step  7J  Update  the  variable  metric  AK> 


T 

vK 

A  «  a  +  .  - 

Ck+i  Ck  v  t  d 

-K  -K 


~K  ^K  — K  ~K 
-K  ^K-K 
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[Step  8]  Test  for  stopping  criteria. 


If  •  Gj^  <.  e  ,  terminate  the  adaptation,  otherwise 

go  to  step  2. 

D.  DERIVATION  OF  SEARCHING  TECHNIQUES  FOR  EXTREMUM, 

GRADIENT  SEARCH  METHODS  FOR  THE  MAXIMUM  OF  THE 
MS NR  PERFORMANCE  FUNCTION 

1 .  Approximation  for  Best  Step  Adaptation  Gain 

The  maximization  of  signal  to  noise  ratio  performance 
function  J,  as  defined  in  (2.25);  is  a  non-linear  performance 
function  of  the  filter  vector  H.  The  function  J  being  non¬ 
quadratic  introduces  new  difficulties.  The  methods  which 
have  been  theoretically  proved  to  converge  in  N  steps  for 
quadratic  cases  like  the  mMSE,  no  longer  converge  as  fast. 

The  adaptation  gain  can  no  longer  be  efficiently  computed 
by  the  best  step  concept  because  of  the  large  amount  of 
computation  required  to  obtain  the  best  step.  In  order  to 
make  tnis  gradient  search  method  efficient,  the  adaptation 
gain  is  approximated  by  the  "best  step"  concept  to  generate 
a  nondecreasing  sequence  of  performance  functions  {JK>  which 
finally  converges  to  the  maximum  of  J. 

Lemma  2.06 

Let  the  performance  function  J  be  defined  as  in 
(2.25),  and  the  adaptive  filter  be  updated  according  to 
(2.67),  then  the  best  step  adaptation  gain  at  iteration  step 
K  satisfies  the  relation: 


i 
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aK  = 


-K  Q*SS  '  JK+1  *  ~NNj  *— K 
^ t?SS  ‘  JK+1  ?NN^  -K 


(2.81) 


Proof 


The  proof  is  based  on  the  Lemma  2.05.  Using  (2.67)  and 


Lemma  2.05,  we  obtain: 


gk+i’Ik  "  0 


(2.82) 


where  is  the  gradient  of  the  performance  function  J 

at  the  K+l  iteration  step,  and  Vj,  is  the  step  direction 
(search  direction)  vector.  But  according  to  (2.25), 


=  — K+1?SS  —K+l 
-K+ 1~NN  -K+l 


The  gradient  of  JK+1  with  respect  to  -K+l  is : 


GK+1  A  vH  j3  ~~T - - - 

"K  -K+l.  — K+ 1  ~NN— K+ 1 


'  (?SS  '  JK+1?NN}  *  -K+l 


(2.83) 


Using  (2.83)  in  (2.82),  we  obtain: 


— K+1?NN— K+l 


—K+i  •  »ss  •  JK+1  *NN>  *  vK  »  o 


(2.84) 


But  according  to  (2.67) 


-K+l  -K  +  aK  *  Ik 
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and  RgS»  RNN  being  symmetric  and  positive  definite  gives: 


CHk  +  aKVK  )  C?sS-"  JK+1  *  rnn)*  Xk  *  ° 


(2.85) 


So , 


-K  f~SS  '  JK+1  *  ?NN^-K  +  aK  *  -K  (~SS  '  JK+1~NN)-K  *  0 


aK  =  ' 


-K  ^~SS  ‘  JK+1  -K 


T 

VJ  (R 


“  J  v  .  1  Rvr\r)  V , 


-K  ^SS  K+l  ~NN-'  -K 


(2.86) 


Q .  E .  D . 

The  adaptation  gain  in  (2.86)  cannot  be  obtained 
because  it  is  a  function  of  JK+^  which  itself  cannot  be  com¬ 
puted  without  aK-  Thus  the  "best  step"  concept  introduces 
a  nonlinear  problem  for  the  MSNR  performance  function.  In 
order  to  overcome  this  problem  of  solving  a  nonlinear  equation 
in  each  iteration,  the  adaptation  gain  will  be  approximated 
by  using  JK  instead  of  JK+r  S  ince  JK  is  obtained  one  step 
prior  to  ,  JK+^  does  not  need  to  be  solved.  Now  we  must 
prove  that  this  choice  of  adaptation  gain  for  the  MSNR  per¬ 
formance  function  will  generate  a  nondecreasing  sequence 
{J„}  which  eventually  converges  to  the  optimum  J 

Lemma  2.07 

Let  the  performance  function  J  be  defined  as  in  (2.25) 
and  the  filter  HK  be  updated  by  (2.67),  let  the  adaptation 
gain  aK  be  given  by 


a. 


-K  C?SS  '  JK  ~NN^  -K 
(?SS  '  JK  ?NN^  -K 


(2.87) 
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Then  it  will  generate  a  nondecreasing  sequence  {Jj,}  which 
converge  to  the  maximum  J, 


Proof 


Using  (2.25),  we  obtain: 

nT 
J 


-  -K+l  ~SS  —K+l 

'k+i  -  rr  "  ~ 

-K+l  ~NN  -K+l 


(2.88) 


Substitution  of  (2.67)  in  (2.88)  gives: 


'K+l 


*•  — K  *  aK  ^  ~SS(-K  *  aK  -Kj 
C  -K  +  aK  -K)T  ?NN(-K  +  aK  —  K^ 


'K 


(2.89) 


V^R  H  V^R  V 

,  .  -K  ~SS  — K  t  2  -K  ~SS  -K 

x  +  l<xY  •  — =; -  +  w - 

^  IT  1  n  TT  ^  ft  A 


-K  ?SS  -K 


-K  ?SS  — K 


,  .  ,  -K  ?NN  -K  ”  2  ^7  ~NN  -K 

1  +  LQ.Y  *  — r -  *"""  *  Otv  — T - 

^  if  A  n  u  Pl  tv  1 


— K  ~NN  -K 


-K  ~NN  -K 


Equation  (2.89)  is  simplified  due  to  the  fact  that  Rgg,  R^ 
are  symmetric  and  positive  definite. 

In  order  to  obtain  a  non-decreasing  sequence  {J^},  we 
must  satisfy:  JK^  >.  J^,  but  the  sequence  (JK>  is  positive, 
so : 

J, 


'K+l 


>  1 


(2.90) 


In  order  for  (2.80)  to  satisfy  (2.90),  it  can  be  seen 


that 
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+  Op.  ,-K  Bss  -K  2  -K  ?SS  — K  v  ^  „ 

1  +  2a  * — - -  +  <xv  - - -  >  1  +  2av 

^  U  1  r»  IJ  Jv  u  1  r\  tt  h 


-K  ?SS  -K 


-K  ?SS  -K 


-K  ~NN  -K 
-K  ~NN  — K 


+  a 


2  -K  ~NN  -K 
K  u  T  n  u 
-K  ~NN  -K 


(2.91) 


Using  (2.91),  (2.25)  and  the  fact  that  R^,  Rgg  are  positive 
definite  matrices,  we  obtain: 


2*-K  BSS  -K  +  aK  *  -K  ?SS  -K  -  JK(2  -K  ?NN  -K  +  aK*-K  ?NN  — 


(2.92) 


aK*-K  (~SS  '  JK  ?NN^-K  -  '  2*-K  (?SS  '  JK  ?NN^-K 

,  ^!ss-jkW»k 

a„  >  -2  •  — m - — - 

VK  (?SS  ‘  JK  ?NN')VK 


(2.95) 


So  the  adaptation  gain  given  in  (2.87)  generates  i  non¬ 
decreasing  sequence  (JK>  because  it  satisfies  (2.93). 

Q.E.D. 


Lemma  2.08 

Let  the  performance  function  J  be  defined  as  in  (2.25) 
and  the  filter  vector  being  updated  by  (2.67),  then  for 
each  iteration  step  K,  the  gradient  of  J  is  orthogonal 
to  the  filter  vector  regardless  of  the  adaptation  gain. 
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Proof 


The  performance  function  given  by  (2.25)  is 
.  .  -K  ?SS  -K 

K  '  hkt  rnn  hk 

The  gradient  of  with  respect  to  the  filter  vector  is 


given  by 


— K  ^  7H.J  =  rr 


K  «K  ~NN  «K 


From  (2.85),  it  follows  that: 


(?SS  '  JK  *  ?NN^  *  — K 


(2.83) 


«K  '  GK 


Z  •  «K  «SS  •  JK  ?NN>  '  HK  '2'94> 


— K  ?NN  »K 


hkTgk 


2  •  r  -it  -ss  — k  _  .  Hk  ?nn  Sk  ) 

~k  Snn  Hk  ’  K  Hk  ?nn  »k 


Using  (2.25)  in  (2.95),  we  obtain: 


(2.95) 


— K  -K  =  2 


(J 


K 


JK)  =  0 


Therefore,  G„  =  0 


(2.96) 


Thus  the  filter  vector  at  iteration  step  K  is  ortho¬ 
gonal  to  the  gradient  Gj,  of  the  performance  function  J. 

Q .  E .  D . 

2 .  Steepest  Descent  Method  (SD) 

The  steepest  descent  (SD)  method  as  described  for 

the  quadratic  mMSE  perform  function  can  be  used  here  for  the 
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MSNR  performance  function  with  some  exceptions: 

The  concept  of  the  "best  step"  is  used  by  an  approx¬ 
imation  of  the  best  adaptation  gain.  The  gradient  of  the 
performance  function  with  respect  to  the  filter  vector  is  a 
function  of  the  performance  function.  Thus,  successive 
values  of  performance  function  must  be  computed. 

The  adaptation  equation  used  here  is  identical  to 

(2 . 56) . 

— K+l  =  — K  +  aK  *  — K 


The  adaptation  gain  is  obtained  from  (2.87)  by  replacing 
the  step  direction  vector  VR  with  the  gradient  (the 
direction  of  the  SD) . 

The  adaptation  gain  obtained  is: 


a  *  .  -K  ^SS  "  JK  ?NN^-K 
-K  ^SS  "  JK  ?NN^-K 


(2.97) 


The  matrix  QK  is  defined  as : 


8k  ^  ?SS  ‘  JK  ?NN 

Substituting  (2.98)  into  (2.99),  we  obtain: 


aK 


-K  9k’-K 
-K  9k*-K 


(2.98) 


(2.99) 


The  adaptive  filter  designed  by  the  SD  method  is  carried  out 
in  the  following  steps: 
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[Step  1]  Select  a  starting  filter  vector  Hq,  and  a  stopping 
bound  6 . 

[Step  2]  Compute  the  performance  function  Jj.  as  in  (2.25) 


T  Hr  ?SS  -K 

k  "  irn*  it- 

-K  *NN  -K 


[Step  3]  Compute  the  gradient  GK  =  VH  J^: 

K 


gk  =  TTT 


-K  ?NN  — K 


9k  *  — K 


where  QK  is  given  by  (2.98). 

[Step  4]  Compute  the  adaptation  gain: 

,  .  "k  9k  — k 

aK  -k'  -K 

[Step  5]  Update  the  filter  vector 


— K+l  -K  +  aK  -K 


[Step  6]  Test  for  stopping  condition. 

If  'n~K^  <  <$  ,  then  terminate  the  adaptation. 

IlilK  I'  “ 

Otherwise,  go  to  step  2. 


The  stopping  condition  is  different  from  the  one  used 
for  the  mMSE  criteria  because  in  this  case  the  gradient  G^ 
is  a  nonlinear  function  of  the  filter  and  when  -*•  ®  the 
gradient  GR  -*■  £  (use  (2.83)  to  verify).  Thus,  the  gradient 
does  not  necessarily  vanish  at  the  stationary  point,  but  can 


vanish  when  the  system  diverges. 
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3.  Accelerated  Steepest  Descent  Method  (ASP) 

The  ASD  method  derived  in  (II.C.2)  is  applied  in 
this  section  with  some  modifications  to  design  an  adaptive 
filter  which  maximizes  the  performance  function  J  in  (2.25). 
The  adaptive  filter  is  updated  according  to  (2.67). 


— K+l 


— K  +  a  K  *  — K 


The  step  direction  vector  is  computed  from  the  filter 
vector  Hk  and  the  gradient  of  the  performance  function 


-K 


-  Gk  for  K 


HK  '  HK-  2 


K 


2 ,  4  ,  6  .  .  . 
|  5 1  7  ... 


The  gradient  GR  is  obtained  from  (2.83)  and  the  adaptation 
gain  from  (2.87)  and  Lemma  2.07. 

The  adaptive  filter  designed  by  the  ASD  method  is 
carried  out  in  the  following  steps: 

[Step  1]  Set  a  starting  filter  vector  HQ  =  ,  stopping 

bound  6  and  compute  the  performance  function  JQ  and  the 
gradient  Gq. 

[Step  2]  Compute  the  performance  function  value  at  itera¬ 


tion  step  K. 


[Step  3]  Compute  the  gradient  GK  of  with  respect  to  H^.. 


— K  =  7TT 


-K  ?NN  -K 


9k  •  «K 


where  £K  is  given  by  (2.98) 


[Step  4]  Compute  the  step  direction  vector  VK. 


(  *  — i( 

XK  ) 

(  Hr,  -  H 


-K  — K- 2 


for  K  =  2 ,  4 ,  6 
for  K  =  3 ,  5 ,  7 


[Step  5]  Compute  the  adaptation  gain: 


K 


-K  9k  -k 
-k"  9k  -K 


[Step  6]  Update  the  filter  vector; 


— K+l  *  — K  +  aK  *  — K 


[Step  7]  Test  for  stopping  condition: 

1 1  — K+ 1  "  — K 1 1 

If  -u - jj— — |y . 1J—  <_  6  then  terminate  the  adaption, 

otherwise  go  to  step  2. 


4 .  Fletcher-Reeves  Conjugate  Gradient  Method  (CGF) 

The  Fletcher-Reeves  conjugate  gradient  (CGF)  method 
is  applied  to  the  MSNR  adaptive  filter  in  a  similar  way  as 
for  the  mMSE  adaptive  filter.  However,  the  nonlinear  MSNR 
performance  function  requires  more  computation  and  does  not 
use  the  true  "best  step"  but  an  approximation.  The  "restart" 
concept  was  used  and  found  to  be  able  to  accelerate  the  con¬ 
vergence  speed. 
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The  adaptive  filter  based  on  the  CGF  method  is 
updated  by  the  following  iterative  scheme: 

— K+l  *  — K  +  aK  — K  C2.100) 

The  step  direction  vector  is  obtained  as  in  (II. C. 4)  by 
the  expression: 


i-  if  K  Mod  n  =  0 

-  •  Ik-i 

— K- 1— K- 1 


(2.101) 


The  adaptation  gain  is  obtained  from  Lemma  (2.07)  and 
given  by: 


-K  C?SS  '  JK *  -K 

— k"  CRSS  -  JK  •  ?NfsP  XK 


(2.102) 


Using  definition  (2.90),  we  obtain: 

»  Hv  Qv  Vv 

aK  — (2.103) 
-K  Sk  -K 

The  adaptive  filter  designed  by  the  CGF  method  is 
carried  out  in  the  following  steps: 

(Step  1J  Select  a  starting  filter  vector  Hq  and  a  stopping 
bound  6. 

[Step  2]  Compute  the  performance  function  J^. 
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-K  ,?SS  -K 


-K  ^NN  -K 


[Step  3]  Compute  the  gradient  of  with  respect  to 


-K  ~NN  -K 


9k  *  -K 


where  Qv  is  given  by  (2.98). 

[Step  4]  Compute  the  step  direction  vector  V^. 


/  '  — K 

.  r  +  tfS, 


-K-  1~K*  1 


if  K  Mod  n  =  0. 


*  Ik-1  Else, 


[Step  5]  ‘Compute  the  adaptation  gain. 

Hk  Sx  Ik 

aK  =  -  - - - 

U  9x  — X 

[Step  6]  Update  the  filter  vector  H^. 


— K+l  -K  +  aK  -K 


[Step  7]  Test  for  stopping  condition. 

If  H  J|"~^  1  then  terminate  the  adaptation. 

Otherwise  go  to  step  2. 

5.  Ibllack-Rebiere  Conjugate  Gradient  Method  (CGP) 

The  CGP  method  is  similar  to  the  CGF  method.  The 
only  difference  is  the  way  the  step  direction  is  computed. 


The  CGP  method  uses  the  following  expression  to 


compute  the  step  direction  vector  V^. 


IK 


if  K  Mod  n  *  0 


-K  (-K  ~  -K-l^ 
-K-l-K-l 


Ik-i 


else, 


All  the  rest  is  identical  to  the  CGF  method.  However,  this 
method  was  found  to  converge  much  faster  than  the  CGF  for 
all  the  images  tested  in  this  thesis. 

The  adaptive  filter  designed  by  the  CGP  method  is 
carried  out  in  the  following  steps: 


[Step  1]  Select  a  starting  filter  vector  ^  and  a  stopping 
bound  6 . 

[Step  2]  Compute  the  performance  function  JK. 


_  Mx  ?SS  -K 
K  «xT  ~nn  — X 


[Step  3J  Compute  the  gradient  GR  of  JK  with  regard  to  H 


— K  = 


-K  ?NN  -K 
where  QK  is  given  by  (2.98) 


9k  -K 


[Step  4J  Compute  the  step  direction  vector  V^. 


/  -  if  K  Mod  n  -  0 

V  *  \  T 

-K  I  £  CGk  -  G  ) 

-  Gk  +  — ^ - - - £-±-  •  V 

— K- 1*— K- 1 


Y.yi-1  Else, 
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[Step  5]  Compute  the  adaptation  gain. 


oti 


-K  ?K  -K 

tK  .9k  — k 


[Step  6]  Update  the  filter  vector 


— K+l  *  — K  +  aK  — K 


[Step  7]  Test  for  stopping  condition: 


U— K+l  ”  — K  II 

— n~u — n — u  <  5  then  terminate  the  adaptation, 
II  -K  II 

otherwise  go  to  step  2. 


6.  Davidon-Fletcher-Powell  Variable  Metric  Method  (DFP) 
The  DFP  method  is  applied  to  the  MSNR  adaptive  filter 
in  a  similar  way  as  for  the  mMSE  adaptive  filter.  The  major 
difference  is  the  approximation  of  the  adaptation  gain  and 
the  need  to  evaluate  the  performance  function  at  every  iter¬ 
ation  step  X. 

The  adaptive  filter  based  on  the  DFP  method  is  updated 
by  the  following  iterative  scheme: 

—K+l  — K  +  aK  '  — K 


The  step  direction  vector  VK  is  obtained  by  the  variable 
metric : 

— K  *  "  AK  *  — K  (2.10S) 

The  adaptation  gain  is  obtained  from  Lemma  (2.07): 
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(2.106) 


a 


K 


-k  8k  Ik 
Ik  9k  Ik 


where  is  given  by  (2.98).  The  metric  is  updated  by 
the  DFP  iterative  procedure: 


~K+1 


~K  +  aK 


V  VT 
-K  -K 

- - - 


A  P  P  ^  A 
~K  -K  -K  OK 

- - - 

P  A  P 
-K  OK  -K 


(2.107) 


The  vector  PR 


Ik 


in  (2.107)  is  defined  as: 


(2.108) 


The  adaptive  filter  designed  by  the  DFP  method  is 
carried  out  in  the  following  steps: 


[Step  1]  Select  a  starting  filter  vector  the  starting 

correction  metric  Aq  *  I  (where  I  is  the  identity  matrix) , 

compute  the  gradient  G  of  the  performance  function  as 

—o 

before,  set  the  stopping  bound  6. 

[Step  2]  Compute  the  step  direction  vector  VR. 

— K  *  ’  6k  -K 

[Step  3]  Compute  the  adaptation  gain  a^. 

.  »k  8k  Ik 

aK  -  -  - -r - 

Ik  9k  Ik 

[Step  4]  Update  the  filter  vector  H^. 

— K+l  "  -K  +  aK  -K 


74 


[Step  5]  Compute  the  performance  function  J 


K+r 


K+l 


_  — K+l~  SS  -K+l 

- T - 

-K+l-NN  -K+l 


[Step  6]  Compute  the  gradient  of  the  performance 

function  JK+1  with  respect  to  the  filter  vector  HK+1 


-K+l 


— K+1~NN  -K+l 


?K+1  *  -K+l 


where  QK+1  is  given  by  updating  (2.98). 
[Step  7]  Compute  the  vector  £K  by  (2.108). 


-K  "  -K+l  ‘  -K 


[Step  8]  Update  the  correction  matrix  AK+1  by  (2.107) 


~K+1 


~K  +  aK 


V  VT 
— K  — K 

V  ^  p 
-K  -K 


6k  Ik  -k  -k 

-K  ~K  -K 


(2.107) 


[Step  9]  Test  for  stopping  condition. 

This  step  can  be  done  after  step  4  to  save  some  extra 
computations  but  was  placed  here  to  follow  the  consistent 
pattern  as  all  other  methods. 

1 1  — K+ 1  ”  — K  1 1 

If  -U-  -u'-^ — jj - W-  £  <5  then  terminate  the  adaptation  pro¬ 

cedure,  otherwise  go  to  step  2. 
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7 .  Amii^s  Transform  Method  (AT) 

From  test  results,  which  will  be  shown  in  (2.17  — 
2.28),  it  was  observed  that  a  faster  convergence  method  will 
be  helpful  for  designing  the  MSNR  adaptive  filter.  Both  the 
conjugate  gradient  method  by  Pollack  and  the  variable  metric 
method  following  Davidon  do  not  exhibit  the  same  convergence 
speed  as  for  the  quadratic  mMSE  case.  The  reason  for  this 
slower  convergence  for  the  MSNR  performance  function  is  the 
nonlinear  nonquadratic  performance  function  as  shown  in  (2.25) 
It  was  then  decided  to  derive  a  method  tailored  for  this 
performance  function.  The  derivation  of  this  method  is  based 
on  the  generalized  eigenvalue/eigenvector  problem  introduced 
by  the  stationary  points  of  the  performance  function  J  in 
(2.25).  The  stationary  point  of  the  performance  function  J 
in  (2.25)  satisfies  (2.40)  which  can  be  written  in  the  form: 


(Brn  ~xx  '  H*  •  0 


(2.109) 


where  H*  is  the  optimal  filter  vector  which  maximizes  the 
performance  function  J. 

The  optimal  filter  H*  satisfies 

a*  -  r  •  ?nn  '  ?ss  •  a’  <2-I10> 

max 

From  equation  (2.110),  it  is  obvious  that  an  adaptive 
filter  designed  by  using  the  transform  matrix  j*  R^  •  Rgs 
for  updating  the  filter  will  satisfy  (2.110)  if  it  converges 
to  the  optimum. 
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In  order  to  accelerate  the  convergence  of  such  an 
adaptive  filter,  a  gradient  search  is  added  to  update  the 
filter  vector.  The  steepest  descent  search  direction  is 
adopted.  The  "best  step"  concept  is  used  partially  to  com¬ 
pute  the  adaptation  gain. 

At  iteration  step  K+l,  the  filter  update  equation 
is  described  by: 

-K+l  *  3^  *  BNN  '  BSS  *  -K  +  °K  *  -K  (2.111) 

The  transform  matrix  is  defined  as: 

4  JK-  -RNN  •  £SS  '2'112> 

Using  (2.112)  in  (2.111),  we  obtain: 

-K+l  *  ~K*  -K  +  aK*  -K  (2.113) 

The  adaptation  gain  for  the  AT  method  can  be  obtained  by 
Lemma  2.05. 

— K>1*  SK  -  0  t2-57> 


From  (2.83)  and  (2.98),  the  gradient  of  is: 


—K+l 


-K+1RNN-K+1 


^K+l  *  -K+l 


(2.114) 


Using  (2.113)  and  (2.114)  in  (2.57),  we  obtain: 


— K+l  ?NN  — K+l 


5k+i  Q?k  -K  +  aK  -k-’-'  *-K  =  0 

(2.115) 


^?K+1  -,  +  “kSk+1  — *  -K  *  0  (2.116) 

(2.116)  can  be  viewed  as  a  dot  product  between  two  orthogonal 
vectors,  and  the  expression  can  be  modified  to  be: 

-K  *  (5k+1*^K  -K  +  aK  ?K+1  -K-1  “  0  (2.117) 


Solving  (2.117)  for  a„,  we  obtain: 


£kT  •  9k+i  ‘  ‘ 

-K  9k+1  -K 


(2.118) 


Since  Qj^  is  not  known,  we  use  Lemma  (2.07)  to  approximate 


— K  9k  »K 

—K  8k  -K 


(2.119) 


In  order  for  the  adaptation  gain  aK  in  (2.119)  to  be  accept¬ 
able  (i.e.,  the  adaptive  filter  will  converge),  it  must 
satisfy  the  condition  (2.90). 

The  adaptive  filter  designed  by  the  AT  method  is 
carried  out  in  the  following  steps: 

[Step  1]  Select  a  starting  filter  vector  1^,  and  a  stopping 
bound  (5 . 


78 


[Step  2]  Compute  the  performance  function  as  in  (2.25) 


t  ,  ~K  ~SS  -K 

K  u T  R  H 
-K  CNN  -K 


[Step  3]  Compute  the  gradient  Gj,  =»  VH  J^. 

™"K 


— K  ?NN  «JC 


*  9k  — k 


[Step  4J  Compute  the  adaptation  gain. 

HkT  9k 

“K  "  £k  9k  £k 

[Step  5]  Update  the  filter  vector  HR  according  to  (2.113). 

— K+l  ■  Sk  *  aK  Sx 

[Step  6]  Test  for  stopping  condition. 

If  K+l  — <.  6,  then  terminate  the  adaptation, 

I|hk|| 

otherwise  go  to  step  2. 

E.  CONVERGENCE  AND  CONVERGENCE  RATE  OF  THE  GRADIENT  METHODS 
1 .  SD  Adaptive  Filter 

Theorem  2.09 

For  any  starting  filter  vector  Hq,  the  sequence 
{H^}  of  the  adaptive  filter  given  by  (2.56)  converges  to 


I 
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I 


the  unique  optimal  solution  H*  given  by  (2.12).  Further¬ 
more,  the  rate  of  convergence  satisfies 


p  &  11  -K+l  ~  -*  li  L  1  .  1  -  C 

||  HK  -  H*  II  ~  B  1  +  C 


(2.120) 


where  C  is  the  condition  number  of  the  Hessian  matrix  RXX 
of  the  performance  function  J  given  in  (2.10)  and  8  is  a 
constant.  The  condition  number  is  defined  as 

C  =  (2.121) 

AL 

where  X^»  Xg  are  the  largest  and  smallest  eigenvalues  of 
the  Hessian  matrix  Rxx. 

Proof 

The  Kantorovich  inequality  is  used  to  prove  the 

theorem. 

The  functional  ^  is  defined  as: 

\  -  (Hk  -  H*)T  *  Rxx(Hk  *  H*)  (2.122) 

where  RYY  is  the  Hessian  matrix  of  the  performance  function 
in  (2.10).  For  the  adaptive  filter  Rxx  is  the  correlation 
matrix  of  the  observed  image  signal. 

H*  is  the  filter  which  minimizes  (2.10).  The  filter 
vector  H*  is  called  the  optimal  filter.  4*  is  updated  at 
iteration  step  K  +  1  as: 
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(2.123) 


K+l  v— K+l 


CBrt,  *  S*)  ?xx®k.i  -  «*) 


Using  (2.56)  to  substitute  for  H^+1  in  (2.133), 


we  have 


fK+l  =  +  aK  -K  -*-1  ~XX^-K  +  aK  -K 


(2.124) 


Using  the  definition  (2.122)  and  the  fact  that 
is  a  symmetric  matrix,  we  obtain: 

^K+l  *  ¥K  +  2ctK  -K  *  ?XX(-K  S*)  +  aK  — K  ?XX  -K 


The  adaptation  gain  a K  is  given  by  (2.66),  which  can 
be  used  in  (2.125)  to  obtain: 

*K+1  fK  +  2ctK  *  ~KT?XX(~K  ‘ 

1  — K  T 

'7  *  aK  *  TT7  “  *  -K  ?XX  -K 

-K  ~XX  — K 


¥K  +  2aK  -K  ?XX*--K  '  7  aK  -K  -K 


(2.126) 


Using  (2.63)  and  (2.12)  in  (2.126),  we  ootain: 


2  •  RXX(HX  -  H*) 


2  ^~XX  -K  '  -XS  +  -XS 

f! 

0 


— K 


*XX  «*) 


(2.127) 
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Substitution  of  (2.127)  in  (2.126)  gives: 
•  \  +CXK  — kF*£k"  7  a  K  -K  -K 


T  K  +  7  a  K  — K  — JC 


Let  us  define  the  vector  as 


E  A  k  -  h* 
-K  =  -K  - 


Using  this  definition  (2.129)  in  (2.122),  we  obtain: 


¥K  =  -K  RXX  — K 


From  (2.127),  we  obtain: 


—K  =  7‘  hx  —K 


(2.128) 


(2.129) 


(2.130) 


(2.131) 


Using  (2.131)  and  the  fact  that  Rxx  is  symmetric,  (2.130) 


became : 


"  J  ’  -K  RXX  -K 


(2.132) 


Using  (2.132)  in  (2.128),  we  obtain: 


¥K+1  '  VK 


1  „  .  rT  r 
7  aK  -K  -K 
1  .  c  T  r-1  r 
T  — K  RXX  — K 


(2.133) 


Substitution  of  given  by  (2.66)  in  (2.133)  gives: 


\U  -  VJ / 

*K+1  *K 


£k£k 


-K  ~XX  -K  -K  ~XX  -K 


(2.134) 
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Now  the  Kantorovich  inequality  is  used: 

t-K  *XX  — (-K  ~  XX  -K^  „  ^  S  +  A  L-1  2 
- r—2 -  <  - 


Cgk‘  gk) 


4A, 


(2.135) 


where  Ag  and  A ^  are  the  smallest  and  largest  eigenvalues  of 


the  matrix  R 


xx- 

Using  (2.135)  in  (2.134),  we  obtain: 


yK+l  '  yK  <  .  4AS  XL 


f 


K 


(Ac  +  A |  ) 


(2.136) 


XL  +  AS 


)  <  1 


(2.137) 


Again,  the  condition  number  of  the  matrix  R is  defined  as 

a  Ac 


(2.138) 


(2.137)  became 


K+l  fl-C,.  , 

TT  ~  1  r+T  J  -  1 

J\ 


(2.139) 


Since  is  a  positive  definite  matrix,  the  sequence  {l^} 
is  a  positive  sequence. 


Let  us  define 

2 

2  A  ,1  -  Cw  , 

q  -  Cttt5  -  1 


(2.140) 


we  obtain 


Vi  i  •  « 


(2.141) 
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From  (2.141),  we  can  see  that  when  k  •*  ®,  the 
sequence  { ^ converge  to  zero.  The  reason  is  that  we  have 
a  decreasing  positive  sequence  { ,  thus  =  0.  It  implies 
=  £  (use  (2.130)  to  justify  this  statement).  Since 
is  defined  as  -  H*  in  (2.129),  we  conclude  that: 

H  -  H*  =  0  or  H  =  H*  (2.142) 

■iQQ  -  00  - 


This  completed  the  proof  of  convergence. 

From  (2.139),  we  observe  that  the  rate  of  convergence 
of  the  sequence  {t^}  is  given  by  (2.140).  However,  as 
defined  in  (2.122)  is  a  quadratic  function  of  the  vector 


H*,  it  satisfies  the  relation 


g*+l-a‘, 

mk-h*  h: 


1  -  c 


(2.144) 


where  0  is  a  positive  number. 
Thus  we  obtain: 


P 


1  -  C 
TTT 


(2.145) 
(Q.E.D. ) 


theorem  (2.09)  proves  that  the  SD  method  exhibits 
at  least  linear  convergence. 


Definition 

An  algorithm  with  the  property  that 


vergence. 


*  constant  is  said  to  exhibit  linear  con- 


The  linear  convergence  is  sometimes  called  geometric 


convergence  since  it  follows  from  the  definition  that  for 


large  K,  j 


II  Hk  -  H«  II  || Hj  -  H; 


The  speed  of  convergence  of  the  SD  method  is  a  function  of 
the  condition  number  C.  The  more  ill-conditioned  R^x,  the 
slower  will  be  the  rate  of  convergence. 

Theorem  (2.09)  used  the  mMSE  quadratic  function. 

For  the  MSNR  performance  function,  it  was  shown  (II.E.l)  that 
the  sequence  {H^}  generated  by  the  SD  method  converges.  Test 
results  showed  that  the  convergence  of  SD  is  slower  for  MSNR 
than  mMSE. 

2.  ASP  Adaptive  Filter 

The  algorithm  is  illustrated  in  Fig.  2.01. 


Fig.  2.01  The  ASD  algorithm. 


The  steepest  descent  steps  are  labeled  SD,  the  accelerated 
steps  are  labeled  ASD. 


Shah,  Buehler  and  Kempthore  [53]  showed  that  for  an 
n  dimensional  quadratic  function,  the  sequence  of  iterates 
Hq,  H2,  ...  is  identical  to  the  full  sequence  of  iterates 

generated  by  the  conjugate-gradient  descent.  Since  the  con¬ 
jugate  gradient  descent  takes  no  more  than  n  steps  to  reach 
the  minimum  of  the  n  dimensional  quadratic  function,  the 
accelerated  steepest  descent  takes  no  more  than  (2n-l)  steps. 

Applying  the  ASD  method  to  design  a  multidimensional 
adaptive  filter  using  real  test  screen  images  has  shown  poor 
convergence  speed  for  both  the  mMSE  and  MSNR  performance 
functions.  The  reason  is  due  to  error  propagation.  These 
methods  are  sensitive  to  error  propagation,  which  do  not 
satisfy  the  condition  for  accelerated  convergence. 

3.  CGF  Adaptive  Filter 

The  conjugate  gradient  methods  CGP  and  CGF  exhibit 
quadratic  termination  (apart  from  rounding  errors)  for  the 
mMSE  performance  function.  Quadratic  termination  means  that 
for  a  quadratic  performance  function  it  is  guaranteed  that 
the  minimum  will  be  located  exactly  (apart  from  rounding 
errors)  in  no  more  than  n  steps.  However,  for  nonquadratic 
functions  like  (2.25)  the  conjugate  gradient  method  does  not 
exhibit  quadratic  termination.  For  the  infinite  dimensional 
case,  Daniel  [60]  showed  that  the  rate  of  convergence  is: 


.  II  HKn  -  H»||  <  Al  -  As 

p  ’  II Sk  -  B*ll  1  Ski  -  /xj 

where  x^»  Ag  are  the  largest  and  smallest  eigenvalues  of  the 
Hessian  matrix  of  the  performance  function  J. 

Depending  on  the  approach  to  design,  the  adaptive 
filter  for  nonlinear,  nonquadratic  performance  function, 
different  rates  of  convergence  can  be  obtained.  Some 
approaches  exhibit  quadratic  convergence  (those  which  approx¬ 
imate  the  performance  function  by  a  Taylor  series  expansion). 
Others  exhibit  superlinear  convergence. 

Theorem  2.10 

Let  the  performance  function  J  be  defined  by  (2.10) 
and  the  adaptive  filter  be  designed  using  the  conjugate  grad¬ 
ient  method,  then  the  sequence  of  adaptive  filters  {H^}  con¬ 
verges  in  no  more  than  n  steps  to  the  unique  minimum  H*  of 
the  performance  function  J. 

Proof 

The  proof  is  based  on  the  fact  that  both  methods, 

CGF  and  CGP,  are  based  on  the  conjugate  direction  search 
method  which  implies  that  the  step  direction  vector  VK  is 
orthogonal  to  the  gradient  of  the  performance  function  J  at 
iteration  step  K+l.  This  fact  is  stated  as:  [  5S,  61  ] 

-K«l  '  -K  ‘  0  t2-146> 
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*  * 


The  adaptation  equation  is: 

— K+l  *  — K  +  aK  — K  (2.147) 

Its  expression  at  the  iteration  step  n  can  be  related  to  all 
steps  from  iteration  step  K  by: 

n-1 

H  «  H„  +  E  a.  V.  (2.148) 

n  K  1  i»K+lJ  J 

for  any  0  1  K  ^  n  -  1. 

The  gradient  of  the  performance  function  J  at  itera¬ 
tion  step  n  is  given  by: 


— n 


2(?XX  -n  '  -XS^ 


(2.149) 


By  substituting  (2.148)  in  (2.149),  we  get: 


1  n  v 

G_  *  Gv+i  +  ^  a-  ~XX 

-n  -K+l  j=K+1J 


(2.150) 


Using  equation  (2.146)  in  (2.150),  we  obtain: 

n-1 


V  J  G 


E  vJ  R. 


~K  -n  j^K+T*  ~XX 


(2.151) 


The  method  of  conjugate  gradient  is  based  on  generating  a 
conjugate  sequence  of  step  direction  vector  {VKJ. 

The  conjugacy  condition  satisfies: 


-K  ?XX  Xj  "  0  for  K  f  j.  (2.152) 

We  use  (2.152)  in  (2.151)  to  show  that: 

VkT  Gn  «  0  (2.153) 
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The  step  direction  vectors  V  ,  V, , . . .  V  .  form  a 

r  —o'  '  — n - l 

complete  conjugate  basis.  Therefore,  at  iteration  step  n, 

the  only  G  which  satisfies  (2.153)  is  G  =  0. 

“n  *“n  (2.154) 

But  for  the  quadratic  performance  function,  the  gradient 
vanishes  only  at  the  minimum.  So  we  proved  that  the  method 
converges  to  the  minimum  of  J  in  (2.10)  in  no  more  than  n 
steps . 

Substituting  (2.154)  in  (2.149),  we  obtain 


Sxx  Hn  -  Jxs  -  0 

(2.155) 

Hn  =  5xx  Sxs 

(2.156) 

H  =  H* 

— n  — 

(2.157) 

Thus  the  filter  converges  to  the  unique  minimum  of  J. 

Q.E.D. 

In  practical  applications,  it  was  found  that  the 
conjugate  gradient  methods  converge  sometimes  in  more  than 
n  steps.  The  reason  is  the  round-off  error.  The  two  condi¬ 
tions  (2.146)  and  (2.152)  are  not  satisfied,  so  the  sequence 
{ Vj^ }  of  step  directions  does  not  form  a  complete  basis  in  n 
iteration  steps. 

For  the  MSNR  cases,  the  adaptive  filter  could  not 
converge  as  fast  as  in  the  mMSE  cases  because  the  performance 
function  J  in  the  MSNR  is  nonquadratic. 
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4.  The  DFP  Method 


The  variable  metric  DFP  method  exhibits  quadratic 
termination,  apart  from  rounding  errors,  for  the  mMSE  per¬ 
formance  function, 

Fletcher  and  Powell  [58]  proved  for  a  general  per¬ 
formance  function  J  that  a  positive  definite  variable  metric 
Ar  implies  a  positive  definite  A^+1,  updated  by  (2.77).  They 
showed  that  for  a  quadratic  function  like  the  mMSE  type, 
successive  filter  updates  AHq,  AH^  ...  AH^^  form  a  set  of 
conjugate  directions,  and  A  j  *  R^  ,  so  the  DFP  algorithm 
exhibits  quadratic  termination  in  n  steps. 

The  MSNR  performance  function  is  nonquadratic  and 
nonlinear,  so  the  DFP  method  cannot  exhibit  quadratic  ter¬ 
mination.  But  according* to  our  test  results,  it  is  still  a 
fast  convergence  technique.  If  the  method  converges  slowly, 
it  is  recommended  to  restart  the  variable  metric  every  n+1 
steps  by  setting  AK+1  »  I,  to  overcome  round-off  errors. 

5.  The  AT  Adaptive  Filter 

The  Amir  transform  adaptive  filter  exhibits  very 
fast  convergence  speed.  The  reason  lies  in  the  way  it  was 
designed.  Each  iterative  step  uses  a  transform  to  satisfy 
the  generalized  eigenvalue  and  eigenvector  steady  state  equation. 

Theorem  2.11 

Let  the  adaptive  filter  be  updated  by  (2.111)  and  the 
performance  function  defined  by  (2.25).  Then  the  filter  HK 


converges  to  the  unique  optimal  filter  H*,  if  the  adaptation 
gain  aK  satisfies  condition  (2.90). 

The  adaptive  filter  is  updated  by  (2.111) 


— K+l  =  JK  *  ?NN  *  ?SS  -K  +  aK  -K 


(2.111) 


Substituting  (2.83)  for  GK  in  (2.111),  and  defining 


SK  ‘  2k  JNN  Hk 


(2.158) 


we  obtain: 


-K+l  7^  ?NN  ?SS  -K  +  *  f~SS  *  JK  ~NN^  -K 


Rearrang ing  (2.159)  g ives : 


(2.159) 


-K+l  *  ^  ?NN  ?SS  *  Sj  CjjSw  •  &K 


(2.160) 


Subtracting  HK  from  both  sides,  we  obtain: 


-K+l  "  -K 


(I  ♦ 


2ctKJK 


?NN^Jk  ~NN  ?SS  '  V  — K  (2.161) 


I  is  the  identity  matrix. 


Let  us  define  the  matrix  Zv  as: 

-  A. 


h  m  l  * 


2  KJK 


r"  1 

~NN 


(2.162) 


Since  Rj^,  Rgg  are  positive  definite  and  aR,  JR,  6^ 
are  positive  numbers,  thus  ZK  is  positive  definite.  Since 
aK’  are  bounded,  the  norm  of  the  matrix  Z^  is  bounded. 
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In  other  words,  there  exists  a  positive  number  such 

II  ZK  II  <  a 

Taking  the  norm  of  (2.161)  and  using  the  inequality, 

II  A  •  B  ||  <  ||  A  ||  <||  B  || 

where  A,  B  are  matrices,  and  combining  with  (2.163), 

■V  «S» 

we  obtain: 

II  -K+l  *  -K  II  -  X  *  II  *  ?NN  ~SS  '  l  II  *  II  • 


which  turns  out  to  be 

II  — K+l  '  -K  II 

. iririr- 


-  X  *  II ?NN  ?SS  ‘  l  II 


If  the  convergence  error  eR  is  defined  as 

.  a  II  14c*!  -  Sr  II 

K  ’  II  Hk  || 

Y  H^SnnSxx-  IH 

The  largest  eigenvalue  of  j—  ?NN  ?SS  '  l  1S 

A 


max 


-  1 


where  Jmax  is  the  largest  eigenvalue  of  R^N  Rgg. 
But 


II  ?; 


ss 


-  I  II  -  -  1 


that : 

(2.163) 

(2.164) 

•K  II 

(2.165) 

(2.166) 

(2.167) 

(2.168) 

(2.169) 

(2.170) 
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So 


eK  <  X 


(i>»2£  .  d 

A 


(2.171) 


The  adaptation  gain  ctK  is  designed  to  satisfy 
condition  (2.90)  which  states  that: 

JK+1  -  JK  >  0 

Updating  (2.171)  and  using  condition  (2.90),  we  have: 


I. 

eK+l  -  X  * 

,  max 

(  -j - 

JK+1 

1  ) 

II. 

eK  <  X  • 

^max 

(  - 

A 

1  ) 

(2.172) 

III. 

JK  -  JK+1 

Thus,  the  sequence  {e^}  converges  to  zero,  because 

Jmax  is  the  maximum  value  of  the  unimodal  performance 

function,  and  the  sequence  {J^}  is  an  ascending  sequence 

bounded  by  the  upper  bound  J  , 

max 

so 


lim  e 


K 


X 

0 


lim  (isi 

K*»  A 


1)  *  x 


max 

max 


1  ) 


(2.173) 


This  proves  that  the  filter  converges.  At  the 
convergence  point,  (2.170)  satisfies 


11  7“  ?NN  SsS  "  l  11 

OO 


0 


(2.174) 
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or,  in  the  vector  form 


C-J7’  ?NN  Jss  •  j>  •  H-  ■  0 

(2.175)  is  the  equation  for  the  stationary  points  of  the 
performance  function  J.  Thus,  Ja  =  ^max  an^  correspondingly 
H  =  H*. 

“TO  - 

So  the  adaptive  filter  converges  to  the  unique  optimum. 

(Q.E.D.) 


F.  PRESENTATION  OF  RESULTS 

1.  Organization  of  Results 

The  performances  of  both  mMSE  and  MSNR  nonrecursive 
adaptive  spatial  filters  have  been  extensively  evaluated  on 
two  real  world  infrared  images,  shown  in  Fig.  2.1  and  2.2. 
Before  the  detailed  presentation  of  these  results,  a  detailed 
description  is  given  of  how  the  evaluations  are  organized. 

(a)  Filter  type: 

-  Nonrecursive  adaptive  spatial  filter 

-  Search  box  (filter  size)  3  by  3  pixels  with 

the  estimation  pixel  in  the  middle  of  the  filter 

(b)  Optimization  criterion  and  performance  function: 

-  mMSE:  Minimization  of  mean  square  error 

-  MSNR:  Maximization  of  signal  to  noise  ratio 

(c)  Adaptation  equation: 

1.  LMS  approach: 

HK+1  "  — K  + 
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Fig.  2.1  A  9  level 
computer  print  of 
Indiana  infrared 
test  image. 
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Fig.  2.2  A  9  level 
computer  print  of 
China  Lake  infrared 
test  image. 
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(d)  Search  methods: 

1.  LMS  approach: 

Steepest  descent  method 

2.  Gradient  approaches: 

Steepest  descent  method 
Accelerated  steepest  descent  method 
Amir's  method  (apply  only  to  mMSE  case) 

3.  Conjugate  gradient  approaches: 

Fletcher-Reeves  method 
Pollack  method 


4.  Variable  metric  approaches: 

Davidon-Fletcher-Powell  method 


5.  Amir's  transform  approach: 

Apply  only  to  MSNR  case 
(e)  Test  images  used: 

1.  Indiana  image  (Fig.  2.1): 

32  x  32  pixels 

Blue  spike  infrared  spectral  band 


r 


i 


An  image  taken  from  a  city  in  Indiana 
and  used  quite  extensively  as  a  standard 
test  image  for  high  altitude  downward 
looking  infrared  surveillance  system. 

2.  China  Lake  image  (Fig.  2.2): 

32  x  32  pixels 

Thermal  infrared  band  in  10-13y  range 

An  image  taken  from  a  desert  area  in 
China  Lake,  California  with  a  highway 
in  the  picture.  It  has  been  used  as  one 
of  the  standard  test  images  for  short 
distance  side  looking  infrared  target 
acquisition  system. 

Performance  evaluation: 

The  performance  of  the  adaptive  filters  is  presented 
in  four  different  ways,  all  as  a  function  of  the 
number  of  iterative  steps  N. 

1.  Filter  coefficients  as  a  function  of  N. 

(9  coefficients  for  a  3  x  3  spatial  filter) 

2.  Output  variance  as  a  function  of  N. 

3.  Processing  gain  as  a  function  of  N.  The 
processing  gain  is  defined  as  follows: 


PG  =  10  log 


2  2 
m  •  +  a  . 

l  l 


where  m-,  m-,  *  means  of  the  input  and 

A  u  filtered  images  respectively. 

2  2 

o i  >  0  n  *  variances  of  the  input  and 

filtered  images  respectively. 

4.  Output  signal  to  noise  ratio  (used  only  in 
MSNR  cases) : 

Output  SNR  of  the  filtered  image  is  defined 
as  follows: 
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where  H  ■  the  filter  vector 

Rss  a  target  signal  correlation  matrix 
?NN  *  clutter  noise  correlation  matrix 

2 .  Results  of  mMSE  Adaptive  Spatial  Filters 
I  -  Indiana  Image 

The  test  results  of  adaptive  filters  based  on  the 
mMSE  criterion  and  using  Indiana  test  image  are  presented 
in  the  following  figures: 

Fig.  2.3  -  LMS  approach,  steepest  descent  method 

Fig.  2.4  -  Gradient  approach,  steepest  descent  method 

Fig.  2.5  -  Gradient  approach,  accelerated  steepest 
descent  method 

Fig.  2.6  -  Gradient  approach,  Amir's  method 

Fig.  2.7  -  Conjugate  gradient  approach,  Fletcher-Reeves 
method 

Fig.  2.8  -  Conjugate  gradient  approach,  Pollack 
method 

Fig.  2.9  -  Variable  metric  approach,  Davidon-Fletcher- 
Powell  method. 

In  each  figure  three  results  -  the  nine  filter 
coefficients,  output  variance  and  processing  gain  -  are 
presented  as  a  function  of  iteration  steps. 
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Fig.  2.3b 
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Table  H-l  Results  of  mMSE  Adaptive  Spatial  Filter  (Indiana  Image) 


These  results  are  exactly  the  same  as  that  of  the  optimum 
mMSE  filter. 


The  following  additional  numerical  results  are 


presented  in  Table  I I - 1 : 

-  Processing  gain 

-  Mean  of  the  filtered  image 

-  Variance  of  the  filtered  image 

-  Number  of  iteration  steps  to  go  below  the  prescribed 
error 

-  Actual  adaptation  error  when  the  adaptation  is 
stopped. 

a.  Discussion 

These  results  will  be  discussed  in  several  groups. 
(1)  LMS  Approach  and  Steepest  Descent  Method. 

This  approach  is  the  two  dimensional  extension  of  the  most 
widely  used  adaptive  filter  technique.  In  Fig.  2.3,  we  can 
see  that  as  the  adaptation  took  close  to  one  thousand  steps 
to  reach  the  minimum  of  the  output  variance  and  the  maximum 
of  the  processing  gain.  However,  the  adaptation  never 
achieved  a  steady  state,  even  up  to  10,000  steps  of  iteration. 

Further,  there  is  a  steady  state  deviation 
from  the  optimum  output  variance.  It  is  known  as  the  "mis- 
adjustment"  which  commonly  occurs  in  the  traditional  adaptive 
filter  approach  (2  3)  . 

We  believe  these  problems  are  the  consequences 

of  the  basic  assumptions  of  this  LMS  algorithm.  The  reasons 

probably  are  not  obvious  if  we  follow  the  traditional  adaptive 

concept  which  was  initiated  by  Prof.  Widrow  using  the  error 

T 

signal  concept  in  control,  e  =  H  X  -  d.  The  filtered  output 


T 

H  X  is  compared  with  a  desirable  result,  d.  Their  difference, 
e,  is  used  together  with  a  constant,  but  adjustable,  adaptive 
gain,  2y ,  to  form  a  correction  term,  AH,  for  the  filter 
coefficients  to  approach  the  optimization  goal,  which  is  the 
minimization  of  mean  square  error. 

On  the  other  hand,  if  we  consider  the 
adaptation  procedure  as  an  optimization  process,  then,  the 
adaptation  equation  takes  the  form  of 

HKn  -  Hr  *  4Hk  *  Hk  *  aK  Gk 

where  is  called  the  "gradient,"  is  called  the  "step 

size."  The  concept  of  gradient  means  the  gradient  of  the 
performance  function  surface,  J.  The  product  of  adaptation 
step  size  and  the  gradient  £K  is  the  correction  term  AH. 

It  is  postulated  that  the  assumptions  made 
by  the  LMS  approach  are  not  directly  responsive  to  the  goal 
of  adaptation  because  the  error  term  H  X  -  d  is  not  directly 
related  to  the  minimization  of  the  performance  function. 
Further,  the  assumption  that  the  adaptive  gain  2y ,  which 
corresponds  to  the  concept  of  step  size  in  optimization,  is 
constant,  does  not  coincide  with  the  fact  that  the  iterative 
steps  toward  optimization  usually  take  place  in  varying  step 
sizes.  These  problems  contributed  to  the  slow  convergence, 
and  the  steady  state  misadjustment  in  the  LSM  adaptive 
spatial  filters. 


We  developed  several  adaptive  filters  using 
gradient  methods  developed  in  the  optimization  field.  Their 
results  are  discussed  in  the  following. 

(2)  Gradient  Approaches.  Three  different  methods 
were  developed.  Their  results  are  shown  in  Figures  2.4,  2.5, 
and  2.6  for  the  steepest  descent  (SD) ,  accelerated  steepest 
descent  (ASD)  and  Amir's  (AMM)  methods  respectively. 

The  reasoning  described  above  is  quite 
convincingly  supported  by  the  following  observations: 

a.  The  convergence  of  adaptation  is  faster.  It  took  541, 

445,  and  220  steps  for  the  SD,  ASD  and  AMM  methods 
to  reach  the  stopping  condition  of  adaptation  less 
than  1.5  x  10"11  as  shown  in  Table  1 1 - 1 . 

b.  The  adaptation  procedure  indeed  reached  steady  state 
once  the  adaptation  error  is  less  than  the  stopping 
condition. 

c.  The  steady  state  error  is  smaller  than  that  of  the  LMS 
algorithm  as  shown  in  Table  II- 1.  In  fact,  the  output 
variance  is  equal  to  that  of  the  optimum  filter. 

(3)  Conjugate  Gradient  Approaches.  Two  differ¬ 
ent  methods  were  developed.  Their  results  are  shown  in 
Figures  2.7  and  2.8  for  the  Fletcher-Reeves  (CGF)  and  the 
Pollack  (CGP)  methods  respectively. 

Again,  the  improvements  are  clearly  seen. 

In  fact,  they  are  even  better  than  the  gradient  methods.  The 
convergence  took  only  66  and  10  steps  for  CGF  and  CGP  methods 
to  reach  below  the  stopping  condition  of  1.5  x  1011.  At  the 
same  time,  the  output  variance  is  the  same. 

(4)  Variable  Metric  Approach.  Results  of  this 


approach,  which  is  extended  from  the  one  dimensional  work  of 


Davidon-Fletcher-Powell  are  shown  in  Fig.  2.9.  Again,  the 
improvements  are  clearly  seen.  The  background  suppression 
result  is  the  same  measured  by  the  output  variance  and 
processing  gain.  But  the  convergence  speed  is  even  better 
and  took  only  9  iteration  steps  to  reach  below  the  stopping 
condition. 

3 .  Results  of  mMSE  Adaptive  Spatial  Filter  II  - 

China  Cake  Images 

The  test  results  of  adaptive  filters  based  on  the  mMSE 
criterion  and  using  the  China  Lake  test  image  are  presented 
in  the  following  figures: 

Fig.  2.10  -  LMS  approach,  steepest  descent  method 

Fig.  2.11  -  Gradient  approach,  steepest  descent  method 

Fig.  2.12  -  Gradient  approach,  accelerated  steepest 

descent  method  *  * 

Fig.  2.13  -  Gradient  approach,  Amir's  method 

Fig.  2.14  -  Conjugate  gradient  approach,  Fletcher- 
Reeves  method 

Fig.  2.15  -  Conjugate  gradient  approach.  Pollack  method 

Fig.  2.16  -  Variable  metric  approach,  Davidon-Fletcher- 
Powell  method 

In  each  figure,  three  results  are  presented  as 
functions  of  iteration  steps:  filter  coefficients,  output 
variance  and  processing  gain. 

Further,  additional  results  are  summarized  and  pre¬ 
sented  in  Table  1 1 -  2  ; 


-  Processing  gain 

-  Mean  of  the  filtered  image 


-  Variance  of  the  filtered  image 

-  Number  of  iteration  steps  to  go  below  the 
prescribed  stopping  error 

-  Actual  adaptation  error  when  the  adaptation 
is  stopped. 


a.  Discussion 

The  results,  using  the  China  Lake  image,  are 
generally  similar  to  that  using  the  Indiana  image.  Only 
the  important  features  will  be  summarized  below. 


(1)  LMS  Approaches.  The  adaptation  based  on 
the  LMS  approach  again  show  three  problems:  slow  conver¬ 
gence,  never  reached  steady  state,  and  misadjustment . 


New  Approaches  Developed  in  this  Thesis. 


All  new  approaches  achieve  the  same  steady  state  performance 


equal  to  that  of  t^e  optimum  filter  as  shown  in  Table  I I. 2: 


Mean  of  the  filtered  image .  =  6.495  *  10 

_  2 

Variance  of  the  filtered  image .  *  1.2  •  10 


However,  they  converge  to  the  steady  state  value  with  much 
less  numbers  of  steps,  as  shown  in  Table  II. 2  also. 

Therefore,  test  results  on  the  China  Lake 
image  again  demonstrated  the  improvements  in  adaptive  fil¬ 
ters  using  the  approaches  suggested  in  this  thesis. 

It  is  interesting  to  note  that  the  effec¬ 
tiveness  of  background  clutter  suppression  in  the  case  of 
the  China  Lake  image  are  not  as  good  as  that  in  the  case  of 
the  Indiana  image.  For  example,  the  processing  gain  for 
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the  China  Lake  image  is  (19.32)  db  compared  with  (29.874)  db 
for  the  Indiana  image.  We  believe  this  difference  is  related 
to  the  spatial  correlation  of  the  image.  The  higher  the 
correlation,  the  better  is  the  background  clutter  suppression. 
The  Indiana  image  is  more  spatially  correlated  than  the  China 
Lake  image. 
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LMS  Algorithm 


Steepest  Descent  -  mMSE 


Fig.  2.11a 


Fig.  2.11b 


Fig.  2.11c 


ITERATION  I 


Fletcher- Reeves  Method 


Davidon-Fletcher-Powell  Method 


Processing  Gain  (db) 


Table  II. 2  Results  of  mMSE  Adaptive  Spatial  Filter  (China  Lake  Image) 
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These  results  are  exactly  the  same  as  that  of  the 
optimum  mMSE  filter. 


4 .  Results  of  MSNR  Adaptive  Spatial  Filter  I  - 

Indiana  Image 

The  test  results  of  MSNR  adaptive  spatial  filters 
using  Indiana  test  image  are  presented  in  the  following 
figures . 

Fig.  2.17  -  Gradient  approach,  steepest  descent  method 

Fig.  2.18  -  Gradient  approach,  accelerated  steepest 
descent  approach 

Fig.  2.19  -  Conjugate  gradient  approach,  Fletcher- Reeves 
method 

Fig.  2.20  -  Conjugate  gradient  approach.  Pollack  method 

Fig.  2.21  -  Variable  metric  approach,  Davidon-Fletcher- 
Powell  method 

Fig.  2.22  -  Amir's  transform  approach. 

In  each  figure,  four  results  are  presented  as  func¬ 
tions  of  iteration  steps:  filter  coefficients,  output  var¬ 
iance,  processing  gain  and  output  signal  to  noise  ratio. 

Further,  additional  numerical  results  are  summarized 
and  presented  in  Table  1 1 -  3 . 

Output  signal  to  noise  ratio 

Processing  gain 

Mean  of  filtered  image 

Variance  of  filtered  image 

Number  of  iteration  steps  to  reach  below 

the  prescribed  stopping  error 

Actual  adaptation  error. 

Discussion: 

a.  In  the  mMSE  adaptive  filter  study,  we  first 
presented  the  results  of  adaptive  filter  design  by  the  LMS 


algorithm  because  it  is  the  most  frequently  used  method.  We 
extended  it  to  two  dimensions  and  used  it  as  a  benchmark  for 
comparison.  For  the  MSNR  criterion,  we  have  not  yet  found 
any  past  study  of  adaptive  filter  using  this  method.  There¬ 
fore,  comparisons  of  convergence  results  are  based  on  several 
methods  developed  in  this  thesis  study. 

b.  However,  we  can  compare  the  background  clutter 
suppression  results  -  of  the  mMSE  and  MSNR  adaptive  filters. 
For  point  targets,  their  steady  state  filter  coefficients  are 
the  same  if  the  coefficient  of  the  estimation  pixel  are  all 
normalized  to  unity.  Therefore,  the  statistical  properties 
of  the  filtered  image  are  the  same,  i.e.,  the  error  variance 
and  the  mean  of  the  image  after  processing  by  the  two  types 
of  filters  are  identical.  For  the  Indiana  image,  the  mean 
and  variance  of  the  unfiltered  and  filtered  images  are. 

Before  filtering  After  filtering 
mean  3.30397  0.00006495 

variance  0.74111  0.012 

c.  The  convergence  speeds  are  different,  as  shown 
in  Table  XI. 3.  For  a  stopping  condition  of  10  the  num¬ 
bers  of  iteration  steps  to  reach  below  this  condition  are: 

SD  *  739  CGF  *  76  DFP  =  25 

ASD  =  739  CGP  =76  AT  =  2 
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Fig.  2.17a  Steepest  Descent  Method 
Filter  Vector 
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Fig.  2.17b  Steepest  Descent  Method  -  MSNR 
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Fig.  2.20a  Pollack  Method  -  MSNR 
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The  same  trend  in  mMSE  cases  is  found  for  the 
MSNR  cases.  The  variable  metric  method  (DFP)  is  faster  than 
the  conjugate  gradient  methods  (CGF,  CGP)  which  are  faster 
than  the  gradient  methods  (SD,  ASD) . 

It  is  important  to  point  out  that  the  transform 
method  (AT)  which  does  not  have  a  corresponding  method  in 
the  mMSE  cases  has  the  fastest  convergence  speed.  It  took 
only  two  steps  compared  with  the  twenty-five  steps  required 
for  the  variable  metric  method  to  reach  below  the  stopping 
condition. 

5 .  Results  of  MSNR  Adaptive  Spatial  Filters 

II  -  China  Lake  Image 

Test  results  of  MSNR  adaptive  spatial  filters  using 
the  China  Lake  test  image  are  presented  in  the  following 
figures : 

Fig.  2.23  -  Gradient  approach,  steepest  descent 
method 

Fig.  2.24  -  Gradient  approach,  accelerated  steepest 
descent  method 

Fig.  2.25  -  Conjugate  gradient  approach,  Fletcher- 
Reeves  method 

Fig.  2.26  -  Conjugate  gradient  approach,  Pollack 
method 

Fig.  2.27  -  Variable  metric  approach,  Davidon- 
Fletcher-Powell  method 

Fig.  2.28  -  Amir's  transform  method. 

Several  numerical  results  are  presented  in  Table  II. 4 


Output  signal  to  noise  ratio 
Processing  gain 


Mean  of  filtered  image 

Variance  of  filtered  image 

Number  of  iteration  steps  to  reach  below 
the  prescribed  stopping  error 

Actual  adaptation  error. 

Discussion 

a.  Gradient  approaches  have  not  been  included  in 
these  presentations  because  their  convergence  speeds  are  not 
as  fast  as  the  conjugate  gradient,  variable  metric  and  Amir’s 
transform  methods. 

b.  Again,  the  Amir  transform  method  has  the  fastest 
convergence  speed.  It  only  took  three  steps  to  reach  below 
the  stopping  condition  compared  with  fifteen  steps  required 
by  the  next  fastest  method,  the  variable  metric  method. 

c.  Based  on  the  experience  using  the  Indiana  image 
and  the  China  Lake  image,  the  comparative  behaviors  among 
these  methods  are  similar. 
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Fig.  2.23c  Steepest  Descent  Method  -  MSNR 
Processing  Gain  [db] 


139 


ITERATION  * 


ig.  2.25c  Fletcher-Reeves  Method.  -  MSNR 
Processing  Gain  [db] 


6  12  19  24  3t3  36  42  48  54  60 

.  •  •  ITERATION  ft 

g.  2 . 25d  Fletcher-Reeves  Method  -  MSNR _ 

Output  SNR  [db] 


i — — — < - 1 - 1» 


-i - 1- 


5  IS  IS  2a  25  3B  35  40  45  50 


ITERRTION  ft 


ITERRT1 ON  t 


Fig*.  2.26b  Pollack  Method  -  MSNR 
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Table  H-4  Results  of  MSNR  Adaptive  Spatial  Filter  (China  Lake  Image) 


III.  THE  MULTIPLE  MICROCOMPUTER  SYSTEM 


A.  INTRODUCTION 
1 .  General 

Signal  processing  algorithms  are  usually  developed 
on  main  frame  computers.  The  transfer  of  these  algorithms 
to  on-board  processors  in  practical  systems  is,  in  general, 
not  an  easy  task  because  there  are  many  constraints  in  real 
systems  such  as  the  processing  speed,  weight,  volume,  power, 
fault  tolerance  and  others.  This  thesis  undertook  both  the 
theoretical  development  task  and  the  practical  implementation 
investigation.  Specifically,  this  chapter  will  present  the 
second  part  of  this  thesis  which  considers  the  implementation 
of  adaptive  image  processing  algorithms  developed  in  the  last 
chapter  by  a  multiple  microcomputer  system  using  concurrent 
parallel  and  pipeline  processing. 

It  is  important  to  point  out  that  the  digital  computer 
is  not  the  only  technique  for  real  time  implementation.  De¬ 
pending  on  the  amount  and  rate  of  signal  data;  precision  and 
dynamic  range  requirements;  need  of  programmability  and  sev¬ 
eral  oth  er  factors,  different  approaches  of  signal  formats, 
device  technologies,  signal/data  processor  architectures 
should  be  considered.  In  many  cases,  combinations  of  analog, 
sampled  analog  and  digital  processing  approaches  using  optical, 
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electronic  and  acoustical  devices  probably  will  offer  cost- 
effective  and  optimum  performance  [178-180].  However,  with 
the  rapid  advances  of  VLSI/VHSIC  technologies  in  both  in¬ 
creasing  speed  and  decreasing  power,  size  and  cost,  the 
importance  of  digital  electronic  implementation  in  the  form 
of  distributed  processing  using  multiple  processors  has  been 
increasing  at  a  rapid  rate  and  will  undoubtedly  play  a  more 
and  more  important  role  in  real  on-board  implementation. 

This  part  of  the  thesis  is  to  investigate  and  develop  the 
feasibility  and  potential  of  using  multiple  microcomputer 
systems  for  real  implementation. 

2 .  Multiple  Processor  Developments 

Multiple  microcomputer  systems  are  a  subset  of  larger 
families  of  multiple  processor  systems  whose  developments  were 
started  over  twenty  years  ago.  It  was  obvious  for  a  long  time 
that  several  processors  are  better  than  one.  However,  how 
should  they  be  connected  together  and  effectively  used  has 
not  been  obvious  at  all.  The  answer  depends  on  many  factors. 
First,  what  is  the  objective?  Is  it  real  time  processing, 
fault  tolerance,  multiple  users,  security,  or  some  combina¬ 
tions  of  these?  Second,  what  are  the  available  technologies 
in  both  hardware  and  software?  Third,  what  are  the  con¬ 
straints  in  cost,  weight,  volume,  development  time,  available 
manpower?  The  answers  have  been  very  different  depending  on 
many  of  these  factors.  We  can  identify  several  major  areas 
of  multiple  processor  developments  since  the  early  1960's. 


a.  Supercomputers  [151,  152] 


The  first  area  can  be  generally  called  the  "super¬ 
computers."  Several  processors  were  connected  in  different 
ways  to  offer  parallel  processing  [153-155],  pipeline  process¬ 
ing  [156-158]  or  combined  parallel/pipeline  processing  capa¬ 
bilities.  In  some  cases,  specially  designed  signal  processors, 
called  array  processors,  are  connected  to  a  host  computer  to 
offer  very  fast  data  crunching  capabilities.  In  most  of  these 
cases,  the  basic  processing  elements  to  form  the  multiple 
processor  systems  are  special  arithmatic  or  signal  processing 
units,  not  stand-alone  computers.  Their  inter-communications 
and  the  signal  flow  are  usually  fixed  in  the  design  stage  to 
achieve  very  fast  computing  speed  and  are  not  changed  during 
operation.  Several  representative  systems  are  listed  in 
Table  III-l.  Their  common  objective  is  "fast  computation" 
and  "high  throughput."  The  processing  elements  are  tightly 
coupled. 

b.  Computer  Networks  [161,  162] 

The  second  area  can  be  generally  called  the 
"computer  network."  Several  processors  are  connected  to¬ 
gether  for  intercommunication  and  resource  sharing.  The 
basic  processing  elements  are  mainly  stand-alone  computers. 

A  problem  is  usually  not  partitioned  and  performed  concur¬ 
rently  on  several  processing  elements.  The  system  is,  in 
general,  loosely  coupled.  The  communication  is  carried  out 
by  messages  with  appropriate  synchronization  codes  at  the 

beginning  and  the  ending  of  the  message. 
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c.  Ultra-Reliable  Fault  Tolerant  or  Highly 
Available,  Graceful  Degrading  Computers 
[166,  167] 

The  third  area  can  be  generally  called  "Fault 
Tolerant  or  Highly  Available"  computers.  Multiple  process¬ 
ing  elements  have  been  connected  in  different  ways  to  offer 
either  fail-soft,  fail-safe  or  graceful  degradation  capabil¬ 
ities.  In  most  fault  tolerant  computers,  the  redundancy 
and/or  sparing  are  usually  made  at  the  building  block  levels, 
such  as  the  CPU,  RAM,  I/O  ports,  etc.  to  make  a  very  reliable 
and  fault  tolerant  single  computer  [168].  The  intercommuni¬ 
cations  among  the  elements  are  generally  fixed.  In  recent 
years,  because  of  the  steady  decrease  of  the  cost  of  a  com¬ 
puter,  the  basic  processing  elements  in  a  multiple  processor 
system  are  a  small  number  of  stand-alone  computers  [169-171]. 
These  systems  started  a  new  direction  in  the  multiple  processor 
developments  because  the  intercommunications  among  the  process¬ 
ing  elements  are  no  longer  fixed.  The  processing  tasks  can 
be  flexibly  assigned  to  different  processors.  This  dynamic 
assignment,  or  allocation  capability,  also  allows  a  new  system/ 
software  approach  to  fault  tolerance  and  fault  repair. 

3 .  Multiple  Microcomputer  System  Developments 

The  rapid  advance  of  low  cost  and  small  microcomputers 
has  extended  the  development  described  above  into  a  new  dimen¬ 
sion  because  a  large  number  of  microcomputers,  instead  of 
just  a  few,  can  conceivably  be  interconnected  into  a  system. 

Not  only  can  its  fault  tolerance  capability  be  further 
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increased,  the  computational  or  signal  processing  capability 
can  also  be  much  enhanced  by  providing  concurrent  parallel 
and  pipeline  processing  capabilities. 

The  beginning  of  the  multiple  minicomputer  system 
development  was  started  at  the  Carnegie  Mellon  University 
in  their  Cmmp  system  [172].  Although  it  used  PDP-11  mini¬ 
computers,  its  tightly  coupled  architecture  and  dynamic  memory 
allocation  concept  allowed  a  relatively  large  number  of  pro¬ 
cessing  elements  to  join  together  into  a  single  system.  This 
development  was  soon  followed  by  a  tightly  coupled  multiple 
microcomputer  project,  CM*  [173],  also  at  Carnegie  Mellon 
University.  Since  that  time,  several  tightly  coupled  systems 
have  been  proposed  [174  to  183].  Some  of  them  have  gone  be¬ 
yond  the  conceptualization  stage  and  ‘started  serious  hardware/ 
software  development  efforts.  However,  none  has  reached  the 
operational  stage  at  this  writing. 

At  the  same  time,  another  direction  of  multiple  micro¬ 
computer  development  has  been  pursued  toward  the  "computer 
network"  objective  [184-188].  These  systems  can  be  distin¬ 
guished  from  the  developments  described  above  in  the  following 
major  aspects: 

°  Different  types  of  processing  elements  are  used. 

In  other  words,  they  are  "heterogeneous." 

°  The  processing  elements  are  loosely  coupled. 

0  The  bandwidth  of  the  intercommunication  buses  is 
relatively  low. 


4.  This  Thesis  Research 


The  second  part  of  this  thesis  research  is  to  develop 
a  multiple  microcomputer  system  and  to  investigate  its  feasi¬ 
bility  in  implementing  real  time  on-board  signal/data  process¬ 
ing  for  a  smart  sensor  system.  It  is  similar  to  a  number  of 
multiple  microcomputer  systems  in  development  in  the  past 
three  to  four  years  which  permit  up  to  16  microcomputers  to 
be  interconnected  in  some  ways  to  perform  computations. 

However,  their  objectives,  architectures,  intercommunication 
concepts,  controllers,  hardware  buses  and  processing  elements, 
software  operating  system,  etc.  are  quite  different. 

This  thesis  project  is  presented  by  highlighting  the 
following  features: 

a.  Its  objectives  are  to  provide  a  multiple  tasking 
system  including  fast  image/signal  processing  capability  and 
other  more  moderate  speed  but  highly  reliable  signal/data 
processing  capability  for  system  management,  command  and 
control . 

b.  Some  of  the  signal/data  processing  tasks  will  be 
performed  by  tightly  coupled  processors.  But  the  processors 
performing  other  tasks  do  not  have  to  be  all  tightly  coupled 
together.  Therefore,  a  mixed  tightly  and  loosely  coupled 
system  is  envisioned. 

c.  A  part  of  the  system  must  perform  some  critical 
tasks  which  require  ultra-reliability.  Other  parts  of  the  sys¬ 
tem  only  require  fail-soft  and  graceful  degradation  performance. 
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In  any  event,  a  dynamic  allocation  capability  is  required 
which  allows  flexible  assignment  of  microcomputers  to  perform 
various  tasks,  which  provides  some  fault  tolerance. 

For  these  requirements,  a  multiple  star/multiple 
cluster  system  of  16  bit  microcomputers  was  developed.  Its 
general  concept  and  philosophy  was  developed  by  a  top-down 
system  design  procedure  which  will  be  presented  in  the  next 
Section,  III.B.  It  will  be  explained  how  a  choice  was  made 
considering  several  alternatives  and  seven  important  issues 
related  to  the  system.  In  Section  III.C,  detailed  implemen¬ 
tation  of  these  choices  will  be  presented  by  describing  the 
principles  and  circuits  of  this  multiple  microcomputer  system 
in  five  categories: 

System  architecture 
Processing  resources 
Intercommunication  network 
Intercommunication  procedures 
Multibus  communication. 

The  performance  of  this  system  is  described  in  Section  III.E. 

B.  DESIGN  CONSIDERATIONS  FOR  THIS  MULTIPLE 
MICROCOMPUTER  SYSTEM 

1 .  Introduction 

Although  only  two  large  multiple  microcomputer  systems 
and  one  multiple  minicomputer  system  have  appeared  in  the 
literature  and  reached  operational  status,  a  large  number  of 
different  architectures  have  been  proposed  and  some  are  in 
the  process  of  being  implemented.  The  three  operational 
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systems  are  all  from  the  Carnegie-Mellon  University:  CM* 

[172],  Cvmp  [191]  and  Cmmp  [173],  There  are  now  many  options 
for  the  hardware  and  software  design  of  a  multiple  microcom¬ 
puter  system. 

This  thesis  took  a  top-down  system  design  approach 
to  reach  the  choices  made  for  the  design  of  our  system.  This 
design  process  is  presented  in  several  steps  in  this  section 
to  explain  the  general  idea  and  philosophy  of  this  system. 

In  the  next  section,  the  detailed  design  of  various  parts 
will  be  described. 

2 .  Architecture 

This  thesis  is  primarily  concerned  with  the  imple¬ 
mentation  of  adaptive  image  processing.  It  is  important, 
however,  to  realize  that  the  adaptive  filter  is  only  one  part 
of  a  longer  end-to-end  image  processing  program  for  detecting, 
tracking  and  recognizing  targets  in  noisy  images.  The  adap¬ 
tive  spatial  filter  is  used  to  enhance  the  target  signal  to 
noise  ratio  by  suppressing  the  background  clutter  which  may 
be  enhanced  by  additional  image  processing  techniques,  such 
as  the  adaptive  temporal  filters.  The  clutter  suppression 
stage  is  followed  by  thresholding,  target  acquisition, 
recognition  and  tracking  stages.  These  signal  processing 
operations  are  quite  different.  For  example,  adaptive  spa¬ 
tial  filters  require  the  computation  of  statistical  image 
characteristics,  solving  matrix  equations.  Adaptive  threshold¬ 
ing  requires  the  comparison  and  rearrangement  of  real  numbers. 
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Target  acquisition  usually  involves  pattern  tests  of  numbers 
based  on  spatial,  temporal  and/or  spectral  information. 
Therefore,  although  each  individual  signal  processing  stage 
requires  real-time  or  fast  execution  speed,  different  signal 
processing  stages  do  not  depend  on  one  another  during  process¬ 
ing.  Furthermore,  it  is  important  to  realize  that  processing 
of  target  signals  for  the  mission  objective  is  only  one  part, 
although  a  very  important  part,  of  the  total  signal/data  pro¬ 
cessing  requirements  for  the  whole  system.  There  are 
processing  functions  such  as  management,  control,  communica¬ 
tion  and  others  which  must  also  be  implemented.  The  nature 
and  requirements  of  their  processing  operations  are  quite 
different  and  vary  over  a  wide  range.  Some  do  not  need  high 
processing  speed  but  demand  very  high  reliability.  Others 
do  limited  computation  but  handle  large  amounts  of  data.  In 
general,  the  signal/data  processing  requirements  of  many 
systems  cover  a  wide  range.  Therefore,  we  designed  an  archi¬ 
tecture  which  has  several  levels  of  coupling  among  processing 
elements . 

At  the  first  level,  special  processors  may  be  directly 
coupled  to  a  microcomputer.  At  the  second  higher  level,  sev¬ 
eral  microcomputers  are  connected  to  the  same  system  bus  in 
parallel  and  form  a  "cluster."  A  microcomputer  can  communi¬ 
cate  with  any  other  microcomputer  on  the  same  bus  or  within 
the  same  cluster  directly  through  common  memory.  It  is  a  tight¬ 
ly  coupled,  bus  oriented  multiple  microcomputer  architecture. 
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At  a  higher  level,  the  third  level,  four  clusters  are  con¬ 
nected  by  way  of  a  "complete  star"  bus  switch  network  and 
form  a  "star."  The  communication  of  microcomputers  between 
two  clusters  is  accomplished  by  way  of  the  switch  network. 
Therefore,  they  are  not  as  tightly  coupled  as  microcomputers 
within  a  cluster  because  there  will  be  more  overhead  in 
intercluster  communication  than  intracluster  communication. 
However,  it  was  found  that  using  specially  designed  control¬ 
lers  for  the  intercluster  communication,  the  access  time  was 
increased  by  only  9%.  This  data  is  presented  in  Section  III.E. 
Therefore,  we  can  consider  that  microcomputers  in  different 
clusters  within  the  same  star  are  still  tightly  coupled.  At 
the  next  higher  level,  the  fourth  level,  several  "stars"  are 
connected  together  by  linking  nearest  neighboring  "stars" 
through  a  bus  switch  to  form  a  "lattice  network."  The  inter¬ 
communication  between  microcomputers  from  two  stars  are  sim¬ 
ilar  to  that  within  a  star,  involving  one  central  controller 
and  two  distributed  controllers.  The  overhead  is  practically 
the  same.  Therefore,  from  the  intercommunication  viewpoint, 
microcomputers  from  two  stars,  and  also  throughout  the  systems, 
are  practically  all  tightly  coupled.  However,  through  pro¬ 
gramming,  they  may  be  used  either  in  tight  coupling,  loose 
coupling  or  any  combinations  in  between  to  suit  the  require¬ 
ments  of  the  applications. 
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3.  Intercommunication  and  Control 


Because  of  the  hierarchical  structure  of  the  archi¬ 
tecture,  the  intercommunication  processes  and  their  controls 
are  also  hierarchical  and  are  distributed.  They  are  hier¬ 
archical  because  there  are  three  levels  of  controls  as  shown 
in  Table  I I I . 1 . 

At  the  lowest  level  of  intracluster  communication, 
no  bus  switch  is  needed.  A  Random  Priority  Controller  (RPC) 
is  used  for  arbitration.  Only  a  small  portion  of  the  dis¬ 
tributed  controller  is  used,  mainly  to  check  if  requests 
outside  the  cluster  have  been  granted.  At  the  next  higher 
level  of  intercluster  communication,  the  intrastar  bus  switch 
is  used.  Arbitration  is  accomplished  by  both  distributed 
controller  and  RPC.  Only  a  portion  of  central  controller  is 
used  to  grant  the  intercluster  request.  At  the  highest  level, 
both  interstar  and  intrastar  bus  switches  may  be  used  and  all 
controllers,  central,  distributed  and  random  priority,  are  in 
full  action. 

Further,  the  controllers  are  distributed  because 
there  are  four  identical  RPC  and  distributed  controllers, 
one  in  each  cluster.  Although  there  is  only  one  central  con¬ 
troller,  it  consists  of  four  identical  units,  one  for  each 
cluster.  The  advantages  of  such  a  distributed  control  system 
are:  (1)  Parallel  control  actions  which  enhance  the  speed  of 

"request  arbitration."  (2)  Improved  fault  tolerance  because 
the  control  actions  are  shared  between  four  separate  units 
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and  should  one  malfunction,  the  other  three  can  still  con 
tinue  their  functions. 


4 .  Hardware  Implementation  of  Controllers 
Controller  circuits  can  be  implemented  in  several 

ways : 

a.  Microprocessor  control 

b.  Bit  slice  processor  control 

c.  Digital  logic  circuit  control. 

Two  performance  characteristics  should  be  considered  in  their 
choice  and  design:  programmability  and  speed.  The  micropro¬ 
cessor  approach  has  the  most  versatile  programmability  but 
the  slowest  speed.  The  digital  hardware  approach  has  very 
limited  programmability  but  the  fastest  speed  of  the  three. 

An  estimate  has  been  made  to  compare  their  speeds. 

In  our  design,  the  primary  goal  is  to  offer  the  fast¬ 
est  response  and  arbitration  of  requests  and  communication 
speed.  Therefore,  we  chose  the  digital  logic  circuit  approach. 
Great  care  was  given  to  the  design  of  controller  concepts  and 
circuits,  to  avoid  unexpected  changes.  Further,  Schottky  and 
low  power  Schottky  chips  are  used  due  to  their  speed  and  power 
trade-offs.  CMOS  chips  were  found  to  be  too  slow  and  do  not 
have  adequate  driving  capability. 

5.  Priority  Resolver 

There  are  several  ways  to  arbitrate  multiple  requests 
or  to  resolve  priorities: 
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Fixed  priority  Chain) 

Rotating  priority 
FIFO 

Random  priority 

There  are  two  primary  requirements  for  a  priority 
resolver  circuit:  uniform  and  fast  resolution  of  bus  re¬ 
quests.  In  this  system,  an  Intel  Multibus  is  used  as  the 
system  bus  with  10  MHZ  bus  clock  frequency.  We  decided  to 
design  a  priority  resolver  circuit  to  arbitrate  8  SBCs  within 
one  bus  clock. 

The  fixed  priority  approach  was  not  selected  because 
it  was  unable  to  arbitrate  multiple  bus  requests  and  grant 
their  usages  uniformly.  Test  results  showed  that  in  our 
tightly  coupled  environments,  two  SBCs  are  able  to  share  the 
bus  adequately.  More  than  two  SBCs  produce  unacceptable 
delays . 

Rotating  priority  is  much  faster  than  the  fixed  pri¬ 
ority  approach.  It  is  able  to  arbitrate  multiple  requests 
and  does  grant  their  bus  usages  uniformly.  However,  it  was 
not  our  final  choice  because  the  random  priority  approach  was 
found  to  be  faster.  This  is  because  in  the  rotating  priority 
approach,  every  bus  request  line  is  tested  serially  (in  a 
rotating  manner)  whether  there  are  request  signals  on  these 
lines  or  not.  In  the  worst  case,  the  rotating  priority  re¬ 
solver  grants  the  bus  after  N  searches,  where  N  is  the  number 
of  SBCs  being  arbitrated  by  the  resolver. 
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First  in-first  out  (FIFO)  is  a  resolver  approach 


which  requires  memory.  Because  of  the  time  needed  to  refer¬ 
ence  the  memory,  it  is  not  possible  to  build  a  FIFO  resolver 
to  arbitrate  8  bus  requests  within  100  nsec,  the  bus  clock 
period.  With  current  technology,  a  fast  FIFO  arbiter 
probably  requires  more  than  300  nsec. 

The  random  priority  resolver  is  designed  based  on 
the  binary  tree  synchronous  s'elector  concept.  Consider  our 
case  of  8  SBCs  in  a  cluster.  Three-stage  selection  is  used. 
During  the  first  stage,  four  out  of  eight  lines  are  checked 
simultaneously.  In  the  second  stage,  two  out  of  these  four 
lines  are  checked  simultaneously  again.  The  final  bus  grant 
is  made  in  the  third  stage.  In  other  words,  the  time  for 
searching  and  resolving  the  bus  requests  is  log2N,  which  is 
faster  than  the  rotating  priority  resolver.  Test  results  have 
shown  that  the  random  priority  resolver  is  able  to  arbitrate 
multiple  bus  requests  and  grant  their  bus  usages  uniformly  as 
demonstrated  in  Fig.  3.17.  Four  SBCs  simultaneously  sharing 
the  bus  in  a  tightly  coupled  environment  are  taken  for  the 
test  case.  These  four  SBCs  were  programmed  to  request  the 
bus  usage  almost  1001  of  the  time.  The  BPRN  signals  of  these 
four  SBCs  are  shown.  A  low  signal  of  BPRN  indicates  that  its 
SBC  is  using  the  system  bus.  The  fact  that  none  of  these 
four  traces  showed  any  long  periods  of  bus  usage  or  bus  wait¬ 
ing  demonstrated  that  the  random  priority  resolver  is  able  to 
arbitrate  very  heavy  bus  requests  by  these  four  SBCs  and  grant 
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It  was  found  that,  on  the 


bus  usage  to  them  "uniformly." 
average,  a  "bus  request"  is  granted  in  about  60  nsec. 

6.  Bus  Switches 

Bus  switches  are  one  of  the  most  important  parts  of 
a  multiple  microcomputer  system  because  they  provide  the  in¬ 
terconnection  means  among  the  processing  resources.  There 
are  two  aspects  of  the  "bus  switch"  problem:  bus  switch 
network  and  the  individual  bus  switch  link. 

Many  switch  networks  have  been  investigated,  some 
predated  the  computer  developments  [195].  A  small  number 
of  them  have  been  considered  in  the  multiple  microcomputer 
development:  cross-bar,  banyan,  hyperconcentrator,  simple 
ring,  etc. 

A  combined  approach  was  selected  including  two  levels 
of  switching  networks  because  of  the  consideration  of  multi¬ 
task  signal/data  processing  requirements  in  a  typical  system. 
At  the  higher  level,  many  stars  are  interconnected  in  a 
lattice  architecture.  Interstar  bus  switches  are  provided 
between  neighboring  nodes.  At  the  lower  level,  four  clusters 
are  included  in  each  "star"  node.  They  are  interconnected  by 
a  "complete  star  bus  switch"  network.  The  complete  star 
switch  is  chosen  for  two  reasons.  First,  the  coupling  within 
a  star  should  be  as  tight  as  possible.  The  complete  star 
switch  allows  us  to  connect  two  clusters  by  the  shortest  link. 
Second,  if  a  link  failed,  the  complete  switch  gives  us  two 
choices  to  connect  two  clusters  by  way  of  a  third  cluster  via 

two  links,  thus  providing  some  fault  tolerance. 
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The  important  part  of  the  individual  bus  switch  link 
is  the  switches  themselves.  For  the  Intel  Multibus,  we  found 
that  58  of  the  86  lines  should  be  switched.  There  are  several 
choices  for  the  switches  ; 

Bidirectional:  MOS  types  of  switches,  such  as  CMOS,  VMOS 
and  DMOS. 

Unidirectional:  Bipolar  types  such  as  Schottky,  low 
power  Schottky  and  ECL  types;  Optoelectronic  types. 

Optoelectronic  types  of  switches  were  not  chosen  because 
they  are  slow,  on  the  order  of  10  ysec.  Very  fast  switching 
speeds  on  the  order  of  several  tens  of  nanosec  are  required 
because  today's  Multibus  is  running  at  10  MHZ  which  corre¬ 
sponds  to  a  clock  period  of  only  100  nsec.  CMOS,  VMOS  and 
DMOS  switches  could  provide  such  switching  speeds.  However, 
they  do  not  have  enough  driving  'capabilities  for  the  15  ma 
or  more  required  by  many  of  the  control  and  address  signals 
of  the  microcomputer.  Therefore,  these  MOS  switches  were  not 
chosen,  although  their  bidirectional  feature  and  the  low  power 
characteristics  of  the  CMOS  switches  are  extremely  attractive 
and  reliable.  We  chose  the  low  power  Schottky  switches  be¬ 
cause  of  their  speed  and  driving  capability.  A  typical  per¬ 
formance  is  shown  in  Fig.  3.15  which  shows  the  waveform  of  an 
address  signal  before  and  after  the  switch.  It  can  be  seen 
that  not  only  is  the  delay  short,  on  the  order  of  25  nsec, 
but  also  the  waveform  is  improved  by  the  switch  because  of 
its  good  driving  capability  of  up  to  50  ma.  It  was  tested 
with  a  minimum  load  resistor  of  50  ohms  and  maximum  capacities 


of  270  pf  and  the  switch  continued  to  function  satisfactorily 
up  to  45  MHZ,  One  disadvantage  is  the  need  to  use  two  back- 
to-back  switch  circuits  for  a  bidirectional  switching  of  each 
signal.  Therefore,  a  special  circuit  was  designed  to  provide 
not  only  the  "enable"  signal  but  also  the  "direction." 

7 .  Processing  Elements 

There  are  two  major  types  of  processing  elements  on 
the  system  bus:  general  purpose  microcomputers  and  special 
purpose  processors  which  can  further  be  separated  into  two 
subcategories.  One  is  a  special  purpose  processor  like  an 
array  processor  which  can  perform  several  signal  processing 
operations  such  as  fast  Fourier  transform,  correlation,  convo 
lution,  finite  impulse  filtering,  infinite  impulse  filtering, 
etc.  The  second  type  is  a  special  purpose  processor  which  is 
designed  to  perform  only  one  signal  processing  operation  such 
as  FFT. 

a.  General  Purpose  Microcomputer 

It  was  decided  that  all  general  purpose  microcom¬ 
puters  used  in  our  system  should  be  treated  homogeneously. 
This  is  necessary  because  two  major  principles  of  our  operat¬ 
ing  system  are  based  on  the  "virtual  processor"  [189]  and 
"dynamic  process  allocation"  [190]  concepts  which  require 
homogeneous  processing  elements. 

b.  Special  Purpose  Processors 

It  was  decided  that  special  purpose  processors 
could  not  be  treated  in  the  same  way  as  the  microcomputers. 
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However,  it  has  not  been  decided  at  this  time  exactly  how 
these  special  purpose  processors  should  be  handled.  There 
are  two  important  alternatives.  In  one  case,  a  special  pur¬ 
pose  processor  is  treated  as  an  I/O  port  managed  by  the 
operating  system.  In  the  other  case,  a  special  purpose  pro¬ 
cessor  can  be  operated  in  a  "slave"  mode  on  the  system  bus. 

8 .  Mode  of  Data  Transfer 

The  basic  mode  of  data  transfer  in  most  of  the  mul¬ 
tiple  processor  systems  is  based  on  the  "message  transfer" 
communication.  However,  a  basic  philosophy  of  our  operating 
system  is  the  "loop  free"  structure  which  requires  frequent 
synchronization  primitive  references.  In  other  words,  the 
operating  system  program  on  a  microcomputer  needs  to  refer¬ 
ence  synchronization  primitives  located  in  either  internal 
or  external  global  memories.  These  "references"  are  executed 
via  the  system  bus.  If  the  data  transfer  is  "message”  based, 
the  synchronization  of  processes  could  be  delayed  because 
the  system  bus  is  being  occupied  by  a  long  message  transfer. 

In  order  to  avoid  such  a  delay,  it  was  decided  that  the  basic 
mode  of  data  transfer  should  be  based  on  the  "word  transfer." 
This  allows  several  microcomputers  to  reference  their  synchro¬ 
nization  primitives  and  other  data  in  an  "interleave"  mode. 

However,  the  transfer  of  data  in  "blocks"  is  possible 
if  required.  This  is  accomplished  by  a  special  feature  of 
the  Intel  16  bit  8086  microprocessor  which  can  generate  a  bus 
lock  signal  of  a  duration  specified  by  software.  This  bus 
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lock  signal  holds  the  bus  for  the  completion  of  the  block 
transfer.  Thus,  data  transfer  by  "messages  based  communica¬ 
tion"  is  possible  as  well. 

C.  DESCRIPTION  OF  THIS  MULTIPLE  MICROCOMPUTER  SYSTEM 

1.  Introduction 

In  the  last  section,  we  have  presented  the  reasons 

for  choosing  the  specific  approaches  for  various  parts  of 

our  multiple  microcomputer  system  based  on  a  top-down  design 

procedure  to  meet  the  requirements  of  this  type  of  smart 

sensor  systems.  In  this  section,  more  detailed  description 

will  be  given  to  explain  how  those  choices  are  implemented. 

The  presentation  will  be  made  in  five  major  categories: 

System  architecture  (Section  C.2) 

Processing  resources  (Section  C.3) 

Intercommunication  network  (Section  C.4) 

Intercommunication  procedures  among  resources 
in  different  clusters  and  stars  (Section  C.5) 

Multibus  communication  (Section  C.6) 

Performance  of  this  multiple  microcomputer  system 
will  be  presented  in  Section  D. 

2.  System  Architecture 

The  topology  of  this  system  consists  of  many  "star" 
nodes  interconnected  by  links  to  nearest  neighbor  stars.  A 
two  dimensional  example  is  shown  in  Fig.  3.1.  Each  star  has 
four  links  connected  to  its  four  neighbors.  The  links  are 
bidirectional  system  buses  with  a  bus  switch,  called 
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Figure  3.1  Two  Dimensional  Lattice  Architecture  of 

Multiple  Star  Multiple  Microcomputer  System 


"inter-star  bus  switch"  (ISBSW).  The  "bus  switch"  consists 
of  60  bidirectional  switches  for  60  signal  lines.  Two  types 
of  switches  have  been  investigated:  one  with  latches  and 
one  without  latches  for  the  signal  lines. 

Each  "star"  consists  of  four  clusters  interconnected 
by  a  complete  star  "bus-switch  network."  Each  "cluster" 
consists  of  up  to  eight  microcomputers.  Other  processing 
elements  and  one  or  more  RAM  boards  are  also  connected  onto 
the  system  Multibus.  Fig.  3.2  depicts  the  topology  of  a 
single  star  with  four  clusters.  In  this  example,  the  bus 
switch  network  consists  of  six  bidirectional  system  buses, 
each  with  a  bus  switch  interconnected  as  shown  in  Fig.  3.7. 

3.  Processing  Resources 

Two  types  of  processing  resources  are  used  in  this 

system. 

a.  Basic  Processing  Elements  -  SBC  8612A 

Intel's  16  bit  single  board  microcomputers,  SBC 
8612A,  are  used  as  the  basic  processing  elements.  A  block 
diagram  of  the  SBC  8612A  is  shown  in  Fig.  3.3. 

(1)  The  Single  Board  Microcomputer  SBC-8612A. 

The  iSBC  8612A  Single  Board  Computer  is  a  16  bit  single  board 
computer,  a  complete  computer  system  on  a  single  printed  cir¬ 
cuit  assembly.  The  iSBC  8612A  board  includes  a  16  bit  central 
processing  unit  (CPU)  up  to  32K  bytes  of  dynamic  RAM,  a  serial 
communications  interface,  three  programmable  parallel  I/O 
ports,  three  programmable  timers,  priority  interrupt  control, 
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Figure  3.3  Architecture  of  the  Intel  Single  Board  Computer 


Multibus  interface  control  logic,  and  bus  expansion  drivers 
for  interface  with  other  Multibus  interface-compatible  expan¬ 
sion  boards.  Also  included  is  dual  port  control  logic  to 
allow  the  iSBC  8612A  board  to  act  as  a  slave  RAM  device  to 
other  Multibus  interface  masters  in  the  system.  Provision 
is  made  for  user  installation  of  up  to  16K  bytes  of  read 
only  memory. 

The  iSBC  8612A  Single  Board  Computer  is 
controlled  by  an  Intel  8086  16  bit  microprocessor  (CPU). 

The  8086  CPU  includes  four  16  bit  general  purpose  registers 
that  may  also  be  addressed  as  eight  8  bit  registers.  In 
addition,  the  CPU  contains  two  16  bit  pointer  registers  and 
two  16  bit  index  registers.  Four  16  bit  segment  registers 
allow  extended  addressing  to  a  full  megabyte  of  memory.  The 
CPU  instruction  set  supports  a  wide  range  of  addressing  modes 
and  data  transfer  operations,  signed  and  unsigned  8  bit  and 
16  bit  arithmetic  including  hardware  multiply  and  divide,  and 
logical  and  string  operations.  The  CPU  architecture  features 
dynamic  code  relocation,  reentrant  code,  and  instruction  look¬ 
ahead. 

The  iSBC  8612A  board  has  an  internal  bus  for 
all  on-board  memory  and  I/O  operations  and  accesses  the  system 
bus  (Multibus  interface)  for  all  external  memory  and  I/O  oper¬ 
ations.  Hence,  local  (on-board)  operations  do  not  involve 
the  Multibus  interface  making  the  Multibus  interface  avail¬ 
able  for  true  parallel  processing  when  several  bus  masters 
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(e.g.,  DMA  devices  and  other  single  board  computers)  are 
used  in  a  multimaster  scheme. 

Dual  port  control  logic  is  included  to 
interface  the  dynamic  RAM  with  the  Multibus  interface  so 
that  the  iSBC  8612A  board  can  function  as  a  slave  RAM  device 
when  not  in  control  of  the  Multibus  interface.  The  CPU  has 
priority  when  accessing  on-board  RAM.  After  the  CPU  com¬ 
pletes  its  read  or  write  operation,  the  controlling  bus  mas¬ 
ter  is  allowed  to  access  RAM  and  complete  its  operation. 

Where  both  the  CPU  and  the  controlling  bus  master  have  the 
need  to  write  or  read  several  bytes  or  words  to  or  from  on¬ 
board  RAM,  their  operations  are  interleaved.  For  CPU  access, 
the  on-board  RAM  addresses  are  assigned  from  the  bottom  up 
of  the  1  megabyte  address  space;  i.e.,  00000- 07FFFH.  The 
slave  RAM  address  decode  logic  includes  jumpers  and  switches 
to  allow  positioning  the  on-board  RAM  into  any  128K  segment 
of  the  1  megabyte  system  address  space. 

The  slave  RAM  can  be  configured  to  allow 
either  8K,  16K,  24K,  or  32K  access  by  another  bus  master. 

If  the  iSBC  300  Multimodule  RAM  option  is  installed,  the 
memory  increments  are  16K,  32K,  48K,  or  64K.  Thus,  the  RAM 
can  be  configured  to  allow  other  bus  masters  to  access  a 
segment  of  the  on-board  RAM  and  still  reserve  another  segment 
strictly  for  on-board  use.  The  addressing  scheme  accommodates 
both  16  bit  and  20  bit  addressing. 
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Four  IC  sockets  are  included  to  accommodate 
up  to  16K  bytes  of  user-installed  read  only  memory.  Config¬ 
uration  jumpers  allow  read  only  memory  to  be  installed  in  2K, 
4K,  or  8K  increments. 

The  iSBC  8612A  board  includes  24  program¬ 
mable  parallel  I/O  lines  implemented  by  means  of  an  Intel 
8255A  Programmable  Peripheral  Interface  (PPI) .  The  system 
software  is  used  to  configure  the  I/O  lines  in  any  combina¬ 
tion  of  unidirectional  input/output  and  bidirectional  ports. 

The  I/O  interface  may  be  customized  to  meet  specific  periph¬ 
eral  requirements  and,  in  order  to  take  full  advantage  of  the 
large  number  of  possible  I/O  configurations,  IC  sockets  are 
provided  for  interchangeable  I/O  line  drivers  and  terminators. 
Hence,  the  flexibility  of  the  parallel  I/O  interface  is  fur¬ 
ther  enhanced  by  the  capability  of  selecting  the  appropriate 
combination  of  optional  line  drivers  and  terminators  to  pro¬ 
vide  the  required  sink  current,  polarity,  and  drive/termination 
characteristics  for  each  application.  The  24  programmable 
I/O  lines  and  signal  ground  lines  are  brought  out  to  a  50  pin 
edge  connector  (Jl)  that  mates  with  flat,  woven,  or  round 
cable. 

The  RS232C  compatible  serial  I/O  port  is 
controlled  and  interfaced  by  an  Intel  8251A  USART  (Universal 
Synchronous/Asynchronous  Receiver/Transmitter)  chip.  The 
USART  is  individually  programmable  for  operation  in  most 
synchronous  or  asynchronous  serial  data  transmission  formats 
(including  IBM  Bi-Sync). 
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In  the  synchronous  mode  the  following  are 

programmable ; 

a.  Character  length, 

b.  Sync  character  (or  characters) ,  and 

c.  parity. 

In  the  asynchronous  mode  the  following  are 

programmable : 

a.  Character  length, 

b.  Baud  rate  factor  (clock  divide  ratios  of  1,  16,  or  64), 

c.  Stop  bits,  and 

d.  Parity. 

In  both  the  synchronous  and  asynchronous 
modes,  the  serial  I/O  port  features  half-  or  full-duplex, 
double  buffered  transmit  and  receive  capability.  In  addi¬ 
tion,  USART  error  detection  circuits  can  check  for  parity, 
overrun,  and  framing  errors.  The  USART  transmit  and  receive 
clock  rates  are  supplied  by*  a  programmable  baud  rate/time 
generator.  These  clocks  may  optionally  be  supplied  from  an 
external  source.  The  RS232C  command  lines,  serial  data  lines, 
and  signal  ground  lines  are  brought  out  to  a  50  pin  edge  con¬ 
nector  (J2)  that  mates  with  flat  or  round  cable. 

Three  independent,  fully  programmable  16  bit 
interval  timer/event  counters  are  provided  by  an  Intel  8253 
Programmable  Interval  Timer  (PIT) .  Each  counter  is  capable 
of  operating  in  either  BCD  or  binary  modes;  two  of  these 
counters  are  available  to  the  system's  designer  to  generate 
accurate  time  intervals  under  software  control.  Routing  for 
the  outputs  and  gate/trigger  inputs  of  two  of  these  counters 
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i  may  be  independently  routed  to  the  8259A  Programmable  Inter¬ 

rupt  Controller  (PIC),  The  gate/trigger  inputs  of  the  two 
counters  may  be  routed  to  I/O  terminators  associated  with 
the  825SA  PPI  or  as  input  connections  from  the  8255A  PPI. 

The  third  counter  is  used  as  a  programmable  baud  rate  gener¬ 
ator  for  the  serial  I/O  port.  In  utilizing  the  iSBC  8612A 
board,  the  systems  designer  simply  configures,  via  software, 

1  each  counter  independently  to  meet  system  requirements. 

Whenever  a  given  time  delay  or  count  is  needed,  software 
commands  to  the  8253  PIT  to  select  the  desired  function. 

The  contents  of  each  counter  may  be  read  at  any  time  during 
system  operation  with  simple  operations  for  event  counting 
applications,  and  special  commands  are  included  so  that  the 
contents  of  each  counter  can  be  read  "on  the  fly." 

The  iSBC  8612A  board  provides  vectoring  for 
bus  vectored  (BV)  and  non-bus  vectored  (NBV)  interrupts.  An 
on-board  Intel  8259A  Programmable  Interrupt  Controller  (PIC) 
handles  up  to  eight  NBV  interrupts.  By  using  external  PICs 
slaved  to  the  on-board  PIC  (master) ,  the  interrupt  structure 
can  be  expanded  to  handle  and  resolve  the  priority  of  up  to 
64  BV  sources. 

The  PIC,  which  can  be  programmed  to  respond 
to  edge-sensitive  or  level-sensitive  inputs,  treats  each 
"true"  input  signal  condition  as  an  interrupt  request.  After 
resolving  the  interrupt  priority,  the  PIC  issues  a  single 
interrupt  request  to  the  CPU.  Interrupt  priorities  are 
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independently  programmable  under  software  control.  The 
programmable  interrupt  priority  modes  are: 

(a)  Nested  Priority.  Each  interrupt 
request  has  a  fixed  priority:  input  0  is  highest,  input  7 
is  lowest. 

(b)  Fully  Nested  Priority.  This  mode  is 
the  same  as  the  nested  mode,  except  that  when  a  slave  PIC  is 
being  serviced,  it  is  not  locked  out  from  the  master  PIC 
priority  logic  and  when  exiting  from  the  interrupt  service 
routine,  the  software  must  check  for  pending  interrupts  from 
the  slave  PIC  just  serviced. 

(c)  Auto-Rotating  Priority.  Each  interrupt 
request  has  equal  priority.  Each  level,  after  receiving 
service,  becomes  the  lowest  priority  level  until  the  next 
interrupt  occurs. 

(d)  Specific  Priority.  Software  assigns 
lowest  priority.  Priority  of  all  other  levels  is  in  numer¬ 
ical  sequence  based  on  lowest  priority. 

(e)  Special  Mask.  Interrupts  at  the  level 
being  serviced  are  inhibited,  but  all  other  levels  of  inter¬ 
rupts  (higher  and  lower)  are  enabled. 

(f)  Poll.  The  CPU  internal  interrupt 
enable  is  disabled.  Interrupt  service  is  achieved  by  pro¬ 
grammer  initiative  using  a  Poll  command. 

The  CPU  includes  a  non-maskable  interrupt 
(NMI)  and  a  maskable  interrupt  (INTR) .  The  NMI  interrupt  is 


intended  to  be  used  for  catastrophic  events  such  as  power 
outages  that  require  immediate  action  of  the  CPU.  The  INTR 
interrupt  is  driven  by  the  8259A  PIC  which,  on  demand,  pro¬ 
vides  an  8  bit  identifier  of  the  interrupting  source.  The 
CPU  multiplies  the  8  bit  identifier  by  four  to  derive  a 
pointer  to  the  service  routine  for  the  interrupting  device. 

Interrupt  requests  may  originate  from  18 
sources  without  the  necessity  of  external  hardware.  Two 
jumper-selectable  interrupt  requests  can  be  automatically 
generated  by  the  Programmable  Peripheral  Interface  (PPI)  when 
a  byte  of  information  is  ready  to  be  transferred  to  the  8086 
CPU  (i.e.,  input  buffer  is  full)  or  a  byte  of  information  has 
been  transferred  to  a  peripheral  device  (i.e.,  output  buffer 
is  empty).  Two  jumper-selectable  interrupt  requests  can  be 
automatically  generated  by  the  USART  when  a  character  is 
ready  to  be  transferred  to  the  8086  CPU  (i.e.,  receive  channel 
buffer  is  full)  or  when  a  character  is  ready  to  be  transmitted 
(i.e.,  transmit  channel  data  buffer  is  empty).  A  jumper- 
selectable  interrupt  request  can  be  generated  by  two  of  the 
programmable  counters  and  eight  additional  interrupt  request 
lines  are  available  to  the  user  for  direct  interfaces  to 
user  designated  peripheral  devices  via  the  Multibus  interface. 
One  interrupt  request  line  may  be  jumper  routed  directly  from 
a  peripheral  via  the  parallel  I/O  driver/terminator  section 
and  one  power  fail  interrupt  may  be  input  via  auxiliary 
connector  P2. 
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The  iSBC  8612A  board  includes  the  resources 
for  supporting  a  variety  of  original  equipment  manufacturer 
system  requirements.  For  those  applications  requiring  addi¬ 
tional  processing  capacity  and  the  benefits  of  multiprocessing 
(i.e.,  several  CPUs  and/or  controllers  logically  sharing 
systems  tasks  with  communication  over  the  Multibus  interface) , 
the  iSBC  8612A  board  provides  full  bus  arbitration  control 
logic.  This  control  logic  allows  up  to  three  bus  masters 
(e.g.,  combination  of  iSBC  8612A  board,  DMA  controller, 
diskette  controller,  etc.)  to  share  the  Multibus  interface 
in  serial  (daisy-chain)  fashion  or  up  to  16  bus  masters  to 
share  the  Multibus  interface  using  an  external  parallel  pri¬ 
ority  resolving  network. 

The  Multibus  interface  arbitration  logic 
operates  synchronously  with  the  bus  clock,  which  is  derived 
either  from  the  iSBC  8612A  board  or  can  be  optionally  gen¬ 
erated  by  some  other  bus  master.  Data,  however,  are  trans¬ 
ferred  via  a  handshake  between  the  controlling  master  and  the 
addressed  slave  module.  This  arrangement  allows  different 
speed  controllers  to  share  resources  on  the  same  bus,  and 
transfers  via  the  bus  proceed  asynchronously.  Thus,  the 
transfer  speed  is  dependent  on  transmitting  and  receiving 
devices  only.  This  design  prevents  slower  master  modules 
from  being  handicapped  in  their  attempts  to  gain  control  of 
the  bus,  but  does  not  restrict  the  speed  at  which  faster 
modules  can  transfer  data  via  the  same  bus.  The  most  obvious 
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applications  for  the  master-slave  capabilities  of  the  bus 
are  multiprocessor  configurations,  high  speed  direct  memory 
access  (DMA)  operations,  and  high  speed  peripheral  control, 
but  are  by  no  means  limited  to  these  three. 

Adding  the  optional  iSBC  300  Multimodule 
RAM  to  the  iSBC  8612A  board,  allows  the  on-board  RAM  to  be 
expanded  by  32K  (for  an  on-board  total  of  64K) .  If  the 
optional  iSBC  340  Multimodule  EPROM  is  installed  on  the  iSBC 
861 2A  board,  the  amount  of  on-board  ROM/EPROM  can  be  expanded 
by  16K  (for  an  on-board  total  of  32K). 

b.  Special  Processing  Elements 

Special  purpose  processing  elements  will  also 
be  used  in  this  system  to  enhance  processing  capabilities. 
Typical  examples  are  array  processors,  FFT,  correlators,  etc. 
However,  they  have  not  been  included  in  this  thesis  project. 

c.  Memories 

Three  types  of  memories  are  provided. 

(1)  Secondary  Memory.  It  consists  of  two  mag¬ 
netic  cartridge  hard  discs  and  a  dual  drive  floppy  diskette 
system.  The  magnetic  hard  disc  is  manufactured  by  the  DYNEX 
Company  and  has  a  storage  capacity  of  10  megabytes.  This 
hard  disc  system  is  connected  to  the  system  Multibus,  thus 
allows  fast  data  transfer  rate  and  has  DMA  capability.  Its 
interface  to  the  Multibus  is  made  by  the  Interphase  Corp. 

The  dual  floppy  diskette  drive  is  a  part  of  the  Intel  MDS-220 
development  system. 


(2)  Primary  Memory.  It  consists  of  dynamic 
RAM  and  EPROM  (Erasable  Programmable  Read  Only  Memory) .  The 
EPROMs  reside  in  each  SBC  (8 K  byte  to  16K  byte  per  SBC).  It 
can  be  used  as  the  monitor  storage,  and  to  store  part  of  the 
operating  system.  The  RAMs  reside  in  two  types  of  physical 
locations.  The  first  location  is  on  each  SBC  and  has  a 
capacity  up  to  64K  bytes.  The  second  type  of  location  is  on 
separate  RAM  boards.  A  128K  byte  RAM  board  developed  by  the 
MUPRO  Company  is  used.  The  RAM  in  the  SBC  is  a  dual  ported 
RAM  which  can  be  shared  with  other  SBCs  via  the  Multibus 
interface.  Part  or  all  of  the  dual  ported  RAM  can  be  made 
accessible  only  to  the  on-board  CPU;  in  other  words,  made 
"private"  and  "unshared"  to  the  SBC.  The  stand-alone  RAM 
boards  are  shared  with  other  SBCs  via  the  Multibus  interface, 
d.  Memory  Hierarchy 

The  primary  memory  of  this  type  is  partitioned 
according  to  the  following  hierarchical  scheme. 

A)  Private  Unshared  Memory  -  RAMs  available  on  each  SBC 
which  can  be  accessed  only  by  the  on-board  CPU. 

B)  Internal  Global  Shared  Memory  -  Internal  global 
shared  RAM  available  on  each  SBC  and  special  RAM 
boards.  The  on-board  RAM  in  the  SBC  is  a  dual 
ported  RAM  and  can  be  accessed  by  any  SBC  which 
is  a  member  of  that  cluster  (unaccessible  to  PE 
in  other  clusters).  See  Section  C.3.a.l. 

C)  External  Global  Shared  Memory  -  External  global 
shared  RAMs  reside  in  special  RAM  boards  and/or  in 
dual  ported  RAM  of  the  SBCs.  These  memories  can  be 
accessed  by  any  SBCs  in  the  same  "star,"  and  any 
SBCs  in  the  corresponding  clusters  in  neighboring 
stars . 
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Using  this  memory  hierarchy,  the  total  address 


space  can  be  expanded  from  the  physical  memory  address  space 

of  each  CPU,  The  8086  microprocessor  has  20  address  lines  so 

20 

its  physical  address  space  is  (2  )  ■  1,048,576  bytes,  or 


1M  bytes. 


In  this  implementation,  the  total  address  space 


(memory  space)  for  a  single  star  is  partitioned  in  the  follow 


xng  way: 


(1)  Private  Memor) 


6  uC  in  each  cluster 


8  uC  in  each  cluster 


2  •  65,536  ♦  4  • (65,536  -  8,192)  4  •  64K  +  4  •  (64K  -  8K) 


*  360,448  bytes/cluster 
2  •  64K  +  4  •  (64K -  8K) 

*  352  Kbytes/cluster3 


480K  bytes/cluster 
491,520  bytes/cluster 


(2)  Internal  Global  Memor) 


6  uC/CL 


1  M  bytes  •  j 
»  768K  byte/cluster 
»  786,432  bytes/cluster 


1  M  bytes  •  j 

■  768K  byte/cluster 

■  786,432  bytes/cluster 


(3)  External  Global  Memory 
6  uC/CL  t 
32K  byte/cluster* 

■  32,768  bytes/cluster  1 


32K  bytes/cluster 
■  32,768  bytes/cluster 


As  described  before,  a  "star"  consists  of  four  clusters, 
thus  the  total  memory  space  for  a  single  star  is: 


1  K  bytes  ■  1024  bytes 


4  •  (352K  +  768K  +  32K)  4  •  (480K  +  768K  +  32K) 

a  4,608K  bytes/star  *  5,120K  bytes/star 

a  4,718,592  bytes/star  *  5,242,880  bytes/star 

This  expanded  memory  space  can  be  determined  in  general  as: 

MS  *  Memory  space 

CL  =  Number  of  clusters  in  a  ’’star" 

PM  *  Private  memory.  In  K  bytes. 

GIM  =  Global  internal  memory.  In  K  bytes. 

GEM  *  Global  external  memory.  In  K  bytes. 

N  *  Number  of  SBCs. 

N 

MS  *  CL  •  z  PM.  +  GIM  +  GEM  (3.0) 

i=l  1 

If  all  SBCs  are  assigned  the  same  amount  of  private  memory, 
then  (3.0)  becomes 

MS  -  CL  •  (N  •  PM  +  GIM  +  GEM)  (3.1) 

The  reason  for  computing  the  memory  space  for  6  microcom¬ 
puters  and  for  8  microcomputers  in  a  cluster  is  mainly 
because  of  power  supply  considerations.  The  available  powe 
supply  can  handle  up  to  6  SBCs  in  a  cluster.  However,  the 
controller  for  intercommunication  is  designed  for  8  SBCs. 

4.  Intercommunication  Network 

In  order  to  establish  fast,  reliable  and  high 
of  fault  toleran  communication  among  SBCs  of  different 
clusters  and  stars,  three  level  communication  controllers 


were  designed,  built  and  tested.  They  include  a  combination 
of  random  priority,  distributed,  and  central  controllers  as 
shown  in  Fig.  3.4  for  a  single  star.  Each  cluster  has  its 
own  distributed  controller.  Each  star  has  four  such  control¬ 
lers.  The  four  clusters  share  one  central  controller.  The 
four  distributed  controllers  are  identical,  and  have  some 
degree  of  programmability. 

a.  Distributed  Controllers  (DC) 

A  block  diagram  of  the  distributed  controller  is 
depicted  in  Fig.  3.5.  It  resides  on  a  single  board  located 
in  each  cluster.  Its  primary  functions  are  the  following: 

1)  Arbitration  among  Internal/External  bus  requests 
from  within  and  outside  the  cluster. 

2)  Priority  resolving. 

3)  Inter-cluster  advance  activities  monitoring. 

4)  Interacting  with  the  central  controller. 

5)  Deadlock  avoidance. 

b.  Random  Priority  Controller  (RPC) 

The  RPC  is  a  bus  contention  resolver  based  on 
a  binary  tree  approach.  The  RPC  accepts  up  to  eight  "Bus 
Requests"  (BREQ)  and  issues  a  single  "Bus  Priority  In"  (BPRN) 
signal.  BREQ  is  a  signal  generated  by  the  bus  arbiter  which 
resides  on-board  the  SBC  to  indicate  that  this  particular  SBC 
requires  the  control  of  the  cluster  system  bus  (Multibus)  for 
one  or  more  data  transfers.  BPRN  is  a  signal  generated  by 
the  RPC  to  indicate  to  the  requesting  SBC  that  control  of  the 
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ure  3.4  Diagram  of  a  Three  Level  Control  for  a  Four  Clusters 
Multiple  Microcomputer  System 


From/To 


cluster  bus  is  granted.  Prior  to  issuance  of  a  BPRN,  the 
RPC  generates  an  "advanced  bus  priority  in"  signal  (intra¬ 
cluster  advance  activities  monitor  BPRN*)  which  is  sent  to 
the  ICAAM  as  a  "port  selector"  signal.  This  signal  starts 
a  chain  of  logical  activities  which  eventually  causes  the  DAC 
(deadlock  avoidance  circuit)  to  send  two  signals,  i.e.,  BHD 
(bus  hold)  and  PRE  (priority  enable)  to  the  RPC.  When  the 
appropriate  BHD  and  PRE  are  received  by  the  RPC,  it  will 
generate  the  BPRN  signal.  BHD  is  a  positive  logic  signal 
which  enables  the  tristate  output  of  the  RPC  to  allow  BPRN* 
to  propagate  and  become  a  BPRN  signal,  when  the  PRE  signal 
is  enabled.  If  BHD  goes  low,  it  disables  all  PRN*.  PRE  is 
a  negative  logic  signal  which  is  generated  in  the  DAC  circuit. 
When  the  PRE  signal  is  generated,  it  disables  requests  from 
other  clusters  and  enables  the  output  driver  of  the  RPC  to 
send  the  BPRN. 

The  RPC  has  an  internal  clock  to  synchronize  its 
arbitration  function.  More  details  can  be  found  in  Section 
C.  4 . b. 

ICAAM  (Intra-Cluster  Advance  Activities  Monitor) 
has  a  multiplexer  which  selects  two  signals,  MSBT  (most 
significant  address  bits,  5  bits  out  of  20)  and  ADRDC/ADWTC 
(advance  read  command/ advance  write  command)  when  a  BPRN* 
is  received  from  the  RPC.  By  analysing  the  MSBT,  the  ICAAM 
generates  a  bus  request  of  one  of  the  following  types: 


1)  Intra-cluster  bus  request.  It  is  a  request  for  the 
system  bus  in  the  same  cluster  only.  In  response 
to  this  request,  the  ICAAM  generates  a  IREQ  signal. 

2)  Inter-cluster  bus  request.  It  is  one  out  of  four 
cluster  requests  generated  by  the  ICAAM  of  the 
distributed  controller.  Each  CLREQ*  requests 
three  resources:  one  system  bus  of  the  requesting 
cluster,  one  system  bus  of  the  requested  cluster 
and  one  inter-connecting  bus  switch.  Following  a 
CLREQ*,  the  ICAAM  also  creates  an  EXREQ  for  the  CIC 
(coincidence  inhibit  circuit) . 

3)  Inter-star  bus  request.  This  request,  labeled 
STREQ* ,  involves  three  resources:  the  system  bus 
of  a  cluster  in  the  requesting  star,  the  system 
bus  of  the  corresponding  cluster  in  the  requested 
star,  and  the  inter-connecting  bus  switch  between 
these  two  stars.  Following  a  STREQ*  signal,  the 
ICAAM  also  creates  an  EXREQ  for  the  CIC. 

The  ICAAM  also  generates  an  advanced  read  command 
(ADRDC)  or  advance  write  command  (ADWTC)  before  the  corre¬ 
sponding  read  command  (MRDC)  or  write  command  (MWTC)  is 
generated  by  the  bus  controller  of  the  requesting  SBC.  This 
is  done  by  monitoring  the  activities  of  the  CPU  of  the  re¬ 
questing  SBC  before  the  CPU  grants  the  system  bus.  Those 
signals  are  needed  to  determine  the  direction  of  the  drivers 
in  the  bus  switch  in  advance,  so  that  all  switching  transients 
are  settled  before  a  data  transfer  takes  place. 

.  'V 

CIC  (Coincidence  Inhibiter  Circuit)  -  The  CIC 
accepts  five  signals  as  inputs:  one  STPRN  (star  priority  in), 
three  (cluster  priority  in)  from  the  central  controller  and 
one  IREQ/EXREQ  from  ICAAM,  It  generates  one  output  signal 
INH  (inhibit)  for  the  DAC  (deadlock  avoidance  circuit) .  The 
primary  function  of  the  CIC  is  to  inhibit  a  BPRN  from  the  RPC 
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in  case  that  a  CLREQ*  or  STREQ*  were  issued  by  the  ICAAM, 
until  either  a  CLPRN  or  a  STPRN  is  granted  by  the  central 
controller  to  the  C1C.  The  necessity  of  this  signal  INH 
is  to  prevent  the  system  bus  to  be  tied  down  in  waiting  until 
the  inter-cluster  request  is  granted  and  allow  efficient  bus 
usage  and  reduce  bus  contention. 

DAC  (Deadlock  Avoidance  Circuit) .  A  "deadlock” 
is  a  situation  in  which  two  processes  are  unknowingly  wait¬ 
ing  for  resources  that  are  held  by  each  other  and  thus  un¬ 
available  [192].  More  details  can  be  found  in  Section  C.5.d.,e 
The  primary  function  of  the  DAC  is  to  prevent  deadlock.  Its 
principle  is  similar  to  the  "Suspend"  Lock  method  [Ref.  193]. 
The  DAC  accepts  four  input  signals:  AxNREQ  (any  request), 

INH,  STREQ,  CLREQ  and  generates  three  signals:  BHD  (bus 
hold),  PRE  (priority  enable)  and  CL/STPRN.  Three  cases  will 
be  described  to  explain  the  operations  of  DAC  depending  on 
the  occurrence  of  either  the  CLREQ  (or  STREQ)  and  the  INH 
signals. 

(Case  1)  -  A  CLREQ  (or  STREQ)  occurs  prior  to 
the  INH  signal,  the  CL/STPRN  signal  will  be  granted.  In  this 
case,  BHD  will  go  low  and  PRE  high,  thus  freezing  the  selected 
request  in  the  RPC,  disabling  the  BPRN*  which  will  release 
all  the  resources  held  by  the  appropriate  SBC  via  the  BPRN* 
signal  (ICAAM,  CCU-I).  About  30  nsec  later,  a  CL/STPRN 
will  be  generated  by  the  DAC.  This  allows  the  appropriate 
processing  element  to  grant  the  system  bus. 
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(Case  2)  -  A  CLREQ  (or  STREQ)  signal  occurs 
after  the  INH  signal,  the  CL/STPRN  signal  will  be  blocked. 

It  indicates  that  the  system  bus  is  in  use.  In  this  case, 

BHD  is  high  and  PRE  goes  low,  BPRN  will  be  granted. 

(Case  3)  -  If  the  INH  signal  and  CLREQ  (STREQ) 
signal  occur  simultaneously  within  a  time  window  of  15  nsec, 
the  CLREQ  (or  STREQ)  signal  will  be  blocked  as  before.  In 
case  of  any  occurrence  of  a  transient  CL/STPRN  signal,  the 
"GLITCH  KILLER"  will  suppress  it  and  prevent  the  transient 
from  propagating  to  the  central  controller, 
c.  Central  Controller  (CC) 

The  central  controller  is  a  single  board  control¬ 
ler,  which  consists  of  two  clocks  and  four  identical  units, 
each  corresponding  to  one  cluster  in  the  star.  The  primary 
functions  of  the  CC  are: 

1)  To  arbitrate  among  different  CLREQ  and  STREQ  to  a 
single  cluster. 

2)  Enable  and  disable  the  CL/STPRN  signal  chain. 

3)  Enable  and  disable  the  appropriate  bus  switch  links 
of  the  complete  star  switch. 

A  block  diagram  of  the  CC  is  presented  in  Fig.  3.6. 

CLK-1  -  Clock  1  is  the  main  clock  of  the  central 
controller,  Its  frequency  is  30  MHZ.  It  is  used  to  synchro¬ 
nize  and  enable  the  arbitration  function  of  the  CSRA  (cluster/ 
star  request  arbitor)  and  the  four-phase  clock,  CLK-2. 

CLK- 2  -  Clock  2  is  a  four-phase,  anti-coincidence 
clock.  Its  input  is  CLK-1  which  generates  four  clocks,  one 


each  for  four  CSRAs.  The  functions  of  the  four-phase  clock 


are; 

1)  To  synchronize  the  CLREQ  (or  STREQ)  chain  action  via 
the  CSRA  in  order  to  prevent  deadlocks.  The  deadlock 
avoidance  method  used  in  this  implementation  is  similar 
to  the  "spinning  lock"  method  [192].  The  spinning 
lock  is  rotating  at  a  frequency  of  3.75  MHZ  (30/8  MHZ). 

CSRA  (Cluster/Star  Request  Arbiter)  -  The  CSRA 
is  a  rotating  priority  resolver.  Its  primary  functions  are: 

1)  To  arbitrate  among  requests  from  three  other  clusters 
within  the  same  star  and  from  the  corresponding 
cluster  in  the  neighboring  star. 

2)  To  enable  the  selected  request,  after  being  synchro¬ 
nized  with  the  spinning  lock,  to  propagate  to  the 
requested  cluster. 

The  CSRA  accepts  four  different  requests  to  a  single  cluster 
and  grants  one  of  them  according  to  a  rotating  priority  scheme. 

CSPE  (Cluster/Star  Priority  In  Enable)  -  the  CSPE 
is  a  demultiplexer  whose  primary  function  is  to  enable  the 
CL/STPRN  chain  action.  The  CSPE  is  synchronized  by  the  CSRA. 
When  a  CLPRN  is  received  from  the  requested  cluster,  the  CSPE 
will  enable  the  CLPRN  chain  action  to  the  selected  requesting 
cluster. 


SSEC  (Star  Switch  Enable  Circuit)  -  The  SSEC 
consists  of  a  set  of  six  drivers.  It  accepts  the  different 
CLPRNs  and  generates  two  signals,  ECC,  DIR,  DlR.  ECC  is  a 
negative  logic  signal  which  enables  one  of  the  bus  switch 
links  corresponding  to  the  CLPRN  signal.  DIR  is  a  signal 
which  sets  the  requesting  direction  of  the  drivers  in  the 
selected  link  of  the  "complete  star"  bus  switch.  51 R  is 


the  inverted  DIR  signal.  The  SSEC  is  responsible  for  the 
enabling  of  the  six  different  links  of  the  complete  star  bus 
switch  as  depicted  in  Fig.  3.7. 

5.  Intercommunication  Procedures  Among  Resources 

Communication  among  the  resources  of  this  system  is 
governed  by  the  following  basic  concepts:  Explicitly  seg¬ 
mented  memory;  unshared  local  and  shared  global  internal/ 
external  memory  hierarchy,  asynchronous  process  structure  and 
a  design  decision  that  each  single  board  computer  is  allowed 
to  use  the  system  bus  for  transfer  of  only  one  word  of  data 
and  then  must  release  the  system  bus  to  other  SBCs  except 
when  a  prefix  lock  is  executed  by  software.  A  software  lock 
will  grant  the  bus  to  that  SBC  for  any  length  of  time  needed 
by  that  SBC.  In  general,  this  feature  is  not  required  fre¬ 
quently  so  the  operating  system  will  not  normally  be  delayed 
waiting  for  the  system  bus  to  be  released  in  order  to  test  a 
semaphore,  or  any  other  synchronization  primitives. 

In  order  to  provide  effective  communication  among  all 
processing  elements  (within  a  single  cluster,  among  different 
clusters  in  a  single  "star,"  and  among  "stars")  and  to  arbi¬ 
trate  the  contention  of  bus  usage  (in  star  bus  switch  and 
inter-star  bus  switches),  we  have  developed  an  intercommuni¬ 
cations  system  managed  by  distributed  and  central  controllers, 
as  described  in  Chapter  III. D. 4., 5. 

In  order  to  describe  the  communication  protocol  among 
different  SBCs,  a  two  "star"  system  is  chosen  -  STAR-1,  STAR- 2 
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Figure  3. 7  Diagram  for  the  "Complete  Star"  Bus  Switch  Network 
(Example:  4  Clusters,  6  Bus  Switches) 


as  depicted  in  Fig.  3.8.  Several  examples  of  different 
types  of  communication  are  presented. 

a.  Example  #1  -  Intra-Cluster  Communication 

Intra-cluster  communication  is  accomplished  by 
means  of  data  transfer  via  the  cluster  Multibus.  This  type 
of  communication  does  not  involve  the  central  controller  or 
any  bus  switch.  The  distributed  controller  resident  in  the 
specific  cluster  and  on-board  SBCs  are  the  controllers  of 
this  communication  link. 

For  example,  let  us  assume  SBC-1  in  cluster  A1 
requests  some  information  from  SBC-2  in  the  same  cluster. 
The  sequence  of  events  (Fig.  3.9)  is: 

a)  SBC-1  generates  BREQ  signal. 

b)  The  RPC  of  the  distributed  controller  will  grant 
the  request  and  generates  a  BPRN*  signal. 

c)  The  ICAAM  of  the  distributed  controller  will 
generate  an  IREQ  signal,  for  the  inhibiter. 

d)  From  the  IREQ,  the  "IHC"  generates  an  inhibit 
signal  which  causes  the  DAC  to  send  appropriate 
BHD  and  PRE  signals. 

e)  These  two  signals  are  sent  to  the  RPC  to  close  the 
chain  and  a  BPRN  is  generated. 

f)  The  BPRN  signal  is  applied  to  the  arbiter  circuit 
of  the  corresponding  SBC.  From  this  point,  a 
regular  Multibus  transfer  is  executed. 

These  six  events  are  necessary  to  establish  any 
intra-cluster  communication.  But  they  are  not  sufficient. 
The  following  conditions  corresponding  to  the  requests  from 
other  clusters  and  stars  must  be  examined: 
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Figure  3.8  Diagram  Showing  Inter-Star  and  Intra-Star  Interconnections 
Using  Bus  Switches 

ISBSW  :  Inter-Star  Bus  Switch 
SBSW  :  Star  Bus  Switch  (Intra-Star) 


1)  Is  there  any  other  cluster  in  process  of  communica¬ 
tion  with  this  cluster? 

2)  Is  there  any  other  star  in  process  of  communication 
with  this  cluster? 

For  simplicity  of  this  example,  we  assumed  that 
no  external  requests  were  involved  in  the  process  of  intra¬ 
cluster  communication. 

Upon  termination  of  the  data  transfer  via  the 
system  bus,  SBC-1  releases  its  BREQ  signal  which  releases 
all  sources  held  by  SBC-1.  The  average  time  of  word  transfer 
is  1.65  ysec. 

b.  Example  2  -  Inter-Cluster  Communication 

(within  a  Star) 

Inter-cluster  communication  is  accomplished  by 
means  of  data  transfer  via  two  clusters'  system  buses  (Multi¬ 
bus)  and  the  bus  switch  interconnecting  those  two  clusters. 
This  type  of  communication  involves  all  controllers,  the  star 
bus  switch,  and  the  on-board  SBC  arbiter.  (See  Fig.  3.10). 

Assume  that  SBC-1  in  cluster  A1  requests  some 
information  from  SBC-1  in  cluster  Bl.  The  sequence  of  events 
is : 

1)  SBC-1  of  A1  generates  BREQ  signal. 

2)  The  RPC  of  the  distributed  controller  in  cluster  A1 
locks  on  the  request  and  generates  a  BPRN*  signal. 

3)  The  BPRN*  signal  is  applied  to  the  ICAAM  of  the 
distributed  controller. 

4)  The  ICAAM  generates  two  signals:  CLREQ-B1,  which 
propagates  to  the  rotating  priority  arbiter  of  the 
central  controller  unit  B  and  "EXREQ"  which  is 
applied  to  the  "CIC"  coincidence  inhibiter  of  the 
distributed  controller  of  cluster  Al. 


Communication 


5)  The  "CIC"  coincidence  inhibiter  generates  an  appro¬ 
priate  INH  signal  which  will  cause  the  distributed 
controller  in  cluster  A  to  wait  for  a  CLPRN  from 
the  demultiplexer  of  the  central  controller,  unit  B. 

6)  The  "cluster/star  request  arbiter"  in  the  central 
controller  locks  on  the  CLREQ-B1  signal  and  waits 
for  the  spinning  lock  to  enable  the  CLREQ  chain 
action  and  locks  on  the  request. 

7)  The  CLREQ  signal  is  applied  to  the  DAC  of  the  dis¬ 
tributed  controller  of  cluster  Bl. 

8)  The  DAC  of  the  distributed  controller  of  cluster  Bl 
generates  a  CLPRN  signal  which  is  applied  to  the 
demultiplexer  of  unit  B  of  the  central  controller. 

9)  The  central  controller  enables  the  CLPRN  signal  to 
the  "DAC"  of  the  distributed  controller  in  cluster 
A  which  generates  appropriate  BHD  and  PRE  signals. 

10)  The  BHD  and  PRE  signals  are  applied  to  the  ROC  and 
closes  the  chain  action.  The  RPC  then  generates 
the  BPRN  signal. 

11)  The  BPRN  signal  is  applied  to  the  on-board  SBC-1 
arbiter  which  starts  the  regular  Multibus  communi¬ 
cation. 

12)  After  the  event  #9,  a  parallel  process  is  initialized. 
This  process  is  the  bus  switch  enable.  Two  signals, 

DIR  and  ECC,  are  sent  to  the  bus  switch  which  links 
the  buses  of  cluster  A1  and  cluster  Bl. 

13)  Those  two  signals  prepare  the  switch  for  the  coming 
data  transfer. 

The  initialization  of  the  bus  switch  terminates 
200  nsec  before  the  transfer  of  data  via  the  bus  (switch) . 
This  feature  makes  the  bus  switch  transparent  to  the  request¬ 
ing  cluster,  and  both  clusters  are  linked  on  a  longer  system 
bus  for  the  time  the  transfer  takes  place.  SBC  1  in  cluster 
A1  can  use  the  "longer"  system  bus  (two  system  buses  and  the 
plus  switch)  for  more  than  one  word  transfer,  if  this  feature 


is  requested  by  a  software  bus  lock  instruction  from  SBC  1. 
Termination  of  this  process  is  started  by  releasing  the  BREQ 
signal  by  SBC-1  of  cluster  Al.  This  event  releases  all 
resources  held  by  SBC  1  of  cluster  Al. 

The  sequence  of  events  described  in  this  example 
is  necessary  for  this  type  of  communication.  Other  external 
events  were  not  introduced  in  order  to  simplify  the  example. 
This  sequence  of  events  takes  place  in  an  average  time  of 
2.1  ysec. 

c.  Example  #3  -  Inter-Star  Communication 

Inter-star  communication  is  accomplished  by  means 
of  data  transfer  via  the  system  buses  of  two  clusters  and  the 
bus  switch  interconnecting  these  two  clusters.  This  type  of 
communication  involves  all  controllers,  and  the  bus  switch 
interconnecting  the  two  clusters.  The  sequence  of  events  is 
similar  to  the  previous  example.  Instead  of  the  CLREQ  signal, 
a  STREQ  signal  is  applied  to  the  central  controller.  The 
responding  signal  is  STPRN.  (See  Fig.  3.11). 

Examples  1,  2,  and  3  described  a  case  of  separable 
communication  levels.  In  a  real  application,  the  situation 
can  be  more  complicated.  For  example,  a  simultaneous  com¬ 
bination  of  the  three  different  examples  is  possible.  In 
such  a  case,  deadlocks  could  occur  frequently  [193].  In 
order  to  prevent  those  deadlocks,  two  methods  of  deadlock 
avoidance  are  used. 
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Figure  3.11  State  Diagram  of  Inter-Star  Communication 


"Suspend  Lock"  -  This  method  is  implemented  in 
the  DAC  of  the  distributed  controller.  In  order  to  explain 
how  this  method  works,  the  following  example  is  used. 

d.  Example  #4  -  Deadlock  Avoidance  I  - 
Suspend  Lock 

SBC-i  in  cluster  A1  of  star  1  requests  SBC-j  in 
cluster  A2  of  star  2  (process  PI,  and  SBC-k  in  cluster  A2  of 
star-2  requests  SBC - £  in  cluster  A1  of  star  1  (process  P2)  , 

{  8  >.i  ,  j  ,  k ,  £>^1  >  .  Let's  assume  that  in  time  Tq  the  two  request 
processes  PI  and  P2  progress  to  state  No.  3  (Fig.  3.12). 

At  this  point  of  execution,  the  processes  PI,  P2  are  holding 
the  following  resources: 


PI: 

{RPC-DC-A1, 

ICAAM-DC-A1 , 

CSRA/CCB1, 

DAC-A1, 

CIC-A1 } 

P2 : 

(RPC-DC-A2 , 

ICAAM-DC-A2 , 

CSRA/CCA2 , 

DAC-A2 , 

CIC-A2} 

At  this  point  of  execution,  each  process  requests  the  DAC 
located  in  the  other  distributed  controller.  But  the  two 
DACs  are  held  by  the  requesting  processes  and  are  unavailable 
It  seems  that  we  have  a  deadly  embrace  situation  (deadlock). 

The  DAC  is  designed  to  avoid  such  a  case.  One 
of  the  DAC  (which  will  be  called  the  first  DAC  depending  upon 
the  time  of  arrival  of  the  requests)  will  suspend  the  lock 
of  the  second  DAC,  by  releasing  some  of  the  resources  that 
are  held  by  the  second  requesting  process.  This  way  the 
first  requesting  process  will  be  advanced  while  the  second 
will  be  suspended  and  wait  for  the  first  process  to  terminate 
This  deadlock  could  happen  if  the  suspend  lock  method  is  not 
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used  when  the  two  requesting  clusters  are  located  in  differ¬ 
ent  stars  because  the  two  spinning  locks  of  the  two  central 
controllers  are  not  synchronized.  Therefore,  the  spinning 
lock  function  is  limited  for  inter- star  communication.  This 
is  the  reason  for  having  two  types  of  deadlock  avoidance 
methods.  The  suspend  lock  method  is  used  to  prevent  dead¬ 
lock  for  inter-star  communication.  The  issue  of  synchronizing 
the  spinning  locks  of  the  different  central  controllers  of  a 
multi-star  system  is  not  desirable  for  fault  tolerance,  and 
sometimes  it  may  not  be  possible  to  synchronize  them. 

The  second  method  of  deadlock  avoidance  is  the 
"spinning  lock"  method.  This  method  is  used  to  prevent 
deadlocks  which  may  occur  in  inter-cluster  or  intra-cluster 
communication  within  the  same  star.  If  for  any  reason  this 
method  fails  to  prevent  a  deadlock,  the  "suspend  lock"  method 
will  take  over  and  prevent  the  deadlock.  The  reason  for 
using  two  different  methods  is  to  reduce  the  overhead  created 
by  the  suspend  method  and  to  increase  fault  tolerance. 

CLK-2  in  the  central  controller  is  a  four-phase 
anti-coincidence  clock  as  shown  in  Fig.  3.22.  This  clock  is 
the  "spinning  lock"  generator. 

e.  Example  #5  -  Deadlock  Avoidance  II  - 

Spinning  Lock  *  (Fig.  3.12) 

Let  us  assume  that  SBC-i  in  cluster  A  requests 
SBC-j  in  cluster  B  and  SBC-k  in  cluster  B  requests  SBC-£.  in 
cluster  A.  These  requests  are  all  for  SBCs  residing  in  the 
same  "star."  If  the  two  requests  are  sent  simultaneously  to 


Figure  3.12  State  Diagram  for  A  Deadlock  Example 


the  CSRA  of  CCA  and  CSRA  of  CCB,  respectively,  of  the  central 
controller,  they  eventually  will  progress  to  the  deadlock 
condition  as  explained  in  Example  #4.  In  order  to  prevent 
such  possibility,  the  CSRA  of  the  central  controller  is 
designed  with  two  "lock  in  request"  phases. 

1)  The  first  phase  is  implemented  by  the  rotating 
priority  arbiter. 

2)  The  request  selected  by  the  first  arbiter  propagates 
to  the  "spinning  lock"  circuit  which  will  lock  on 
the  request  only  when  CLK-2  goes  low. 

CLK  has  four  phases.  Since  only  one  goes  low  at  any  given 
time,  it  is  impossible  for  both  requests  to  leave  the  central 
controller  at  the  same  time  to  the  distributed  controller  of 
the  requested  cluster  and  thus  eliminates  the  race  condition 
and  deadlock.  A  race  condition  occurs  when  the  scheduling 
of  two  processes  is  so  critical  that  the  various  orders  of 
scheduling  them  result  in  different  processing  [192] .  The 
minimum  time  difference  caused  by  the  spinning  lock  to  the 
requesting  process  is  equal  to  the  anti-coincidence  time  tac 
of  CLK-2  (Fig.  3.22) . 

6.  Multibus  Communication 

Two  arbitration  circuits  are  used  in  the  Multibus 
communication:  the  on-board  SBC  arbiter  called  Bus  Arbiter 
and  the  RPC  of  the  distributed  controller. 

The  Bus  Arbiter  provides  several  resolving  techniques 
based  on  a  priority  concept  that  at  a  given  time  one  SBC  will 
have  priority  above  all  the  rest.  The  RPC  can  be  regarded  as 
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a  parallel  priority  resolver.  A  parallel  priority  resolving 
technique  has  a  separate  bus  request  BREQ  line  for  each  arb¬ 
iter  on  the  system  bus  (Multibus).  Several  BREQ  lines  enter 
to  the  RPC  input.  For  each  BREQ  line,  there  is  a  correspond¬ 
ing  BPRN  (bus  priority  in)  line  at  the  output  of  the  RPC. 

Only  one  BPRN  signal  can  be  activated  at  any  given  time. 

This  signal  BPRN  is  returned  to  the  highest  priority  request¬ 
ing  bus  arbiter.  The  bus  arbiter  receiving  priority  (BPRN 
active  low)  then  allows  its  associated  SBC  onto  the  multi¬ 
master  system  bus,  as  soon  as  the  bus  becomes  available  (i.e., 
it  is  no  longer  busy) .  When  one  bus  arbiter  gains  priority 
over  another  arbiter,  it  cannot  immediately  seize  the  bus. 

It  must  wait  until  the  present  bus  occupant  completes  its 
transfer  cycle.  Upon  completing  its  transfer  cycle,  the 
present  bus  occupant  recognizes  that  it  no  longer  has  priority 
(BPRN  goes  high)  and  surrenders  the  bus,  releasing  the  Busy 
signal.  Busy  is  an  "active  low"  signal  line  which  goes  to 
every  bus  arbiter  on  the  system  bus  and  is  tied  with  other  busy 
signals  by  a  "OR"  gate.  When  the  "Busy"  goes  high,  the 
arbiter  which  presently  has  bus  priority  (BPRN  active  low) 
then  seizes  the  bus  and  pulls  "Busy"  low  to  keep  other  arb¬ 
iters  off  the  bus.  (See  waveform  timing  diagram.  Fig.  3.13.) 
Note  that  all  multi-master  system  bus  transactions  are  syn¬ 
chronized  to  the  bus  clock  (BCLK) .  This  gives  to  the  parallel 
priority  resolving  circuit  time  to  settle  and  make  a  correct 
decision.  Fig.  3.14  depicts  the  interconnections  between  the 
bus  arbiters  and  the  RPC. 
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In  our  configuration,  every  master  currently  using 
the  bus  will  surrender  the  bus  upon  completing  its  transfer 
cycle  (unless  a  bus  lock  is  executed) .  This  property  is 
accomplished  by  tying  all  CBREQ  (common  bus  request)  lines  o 
of  all  bus  arbiters  to  ground.  CBREQ  is  an  active  low  signal 
which  indicates  to  the  current  master  on  the  bus  that  the  bus 
has  been  requested  by  another  master. 

Two  other  signals,  LOCK  and  CRQLCK,  lend  to  the  flex¬ 
ibility  of  the  bus  arbiter  within  the  system  configuration. 
LOCK  is  a  signal  generated  by  the  processor  to  prevent  the 
bus  arbiter  from  surrendering  the  multi-master  system  bus  to 
any  other  master,  either  higher  or  lower  priority.  CRQLCK 
(common  request  lock)  serves  to  prevent  the  bus  arbiter  from 
surrendering  the  bus  to  a  lower  priority  bus  master  when  con¬ 
ditions  warrant  it.  LOCK  is  used  for  implementing  software 
semaphores  for  critical  code  section  and  real  time  critical 
events  (such  as  memory  refresh  or  hard  disc  transfer) . 

In  the  three  different  types  of  communications  we 
referred  to  the  term  PRN  and  REQ  chains.  The  following  state 
diagrams  depict  those  chains: 

1)  Intra-cluster  communications 


2)  Inter-cluster  communications 


3)  Inter-star  communication 


D.  PRESENTATION  OF  RESULTS 
1 .  Introduction 

The  important  hardware  components  developed  in  this 
thesis  to  support  this  multiple  microcomputer  system  are  the 
following : 

Interconnection: 

Intra-cluster  --  Multibus 

Inter-cluster  --  Complete-Star  Bus  Switch  Network 

Intercommunication  Control  (three  levels) : 
Random-Priority  Controller 
Distributed  Controller 
Central  Controller 

In  this  section,  we  will  present  representative  test 
results  to  answer  two  major  questions. 
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1)  Did  our  design  work? 

2)  How  well  did  it  work? 

Since  the  Multibus  is  developed  by  Intel  and  is  well  docu¬ 
mented  [196],  we  decided  not  to  report  its  operations  here. 
We  will  describe  the  operational  results  of  the  bus  switch 
and  the  three  levels  of  intercommunication  control. 

How  well  they  work  together  in  a  computational 
environment  will  be  reported  in  Chapter  IV  where  the  imple¬ 
mentation  of  an  adaptive  spatial  filter  on  the  multiple 
microcomputer  system  will  be  described. 

2 .  Bus  Switches 

The  function  of  a  bus  switch  is  to  transmit  a  signal 
from  the  Multibus  in  one  cluster  to  the  Multibus  in  another 
cluster.  For  four  clusters,  the  "complete  star  bus  switch 
network"  designed  has  six  branches  of  bus  switches  as  shown 
in  Fig.  3.7.  Although  the  Intel's  Multibus  has  86  lines, 
we  decided  that  only  58  of  them  need  to  be  switched  to 
facilitate  communication  between  two  SBCs  from  different 
clusters.  Therefore,  one  "bus  switch"  includes  appropriate 
circuits  to  transmit  58  signals,  including  data,  address  and 
control  signals. 

Four  figures  will  be  used  to  describe  the  behavior 
of  the  bus  switch.  The  first  three  figures  are  used  to  show 
the  improvement  of  signal  waveform  before  and  after  the  bus 
switch.  The  signals  shown  are  the  following: 


One  data  bit  -  Fig.  3.15a 

One  address  bit  -  Fig.  3.15b 
One  control  signal  -  Fig.  3.15c 

Each  figure  consists  of  two  traces.  The  top  trace  shows  the 
waveform  before  the  switch.  The  lower  trace  shows  the  wave¬ 
form  after  the  switch.  It  can  be  seen  that  in  all  three 
cases  the  waveforms  after  the  switch  are  better  because  their 
rise  times  are  all  shorter,  giving  a  sharper  pulse.  It  is 
interesting  to  note  the  noise  appearing  on  these  three  signals 
They  are  typical  in  the  real  operational  environment.  It 
should  be  noted  that  the  control  signal  in  Fig.  3.15c  is  the 
Acknowledge  Signal  (XACK)  generated  by  the  SBC  requesting  the 
use  of  the  system  bus. 

The  behavicr  of  the  bus  switch  is  described  also  by 
Fig.  3.20  which  shows  the  delay  of  the  switch.  Again,  the 
top  trace  is  before  the  switch,  the  bottom  trace  is  after  the 
switch.  The  delay  is  no  more  than  25  nsec. 

These  four  figures  demonstrated  that  our  bus  switches 
are  adequate  to  provide  communication  between  two  Multibuses 
running  at  10  MHZ. 

3.  Random  Priority  Controllers  (RPC) 

The  function  of  random  priority  controllers  is  to 
arbitrate  the  requests  of  bus  usage  from  many  SBCs,  either 

from  the  same  cluster  or  from  several  clusters.  If  an  SBC 

i 

from  another  cluster  wants  the  Multibus  to  communicate  either 

•> 

with  another  SBC  or  with  £he  Global  RAM,  two  higher  level 
controllers  -  the  central  controller  and  two  distributed 
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A  data  Bit 


Fig.  5.15a 


An  Address  bit 


Fig.  3.15b 


A  control  signal 
"Acknowledge"  (XACK) 

Fig.  3.15c 


15  The  input  and  output  waveforms  of  three 
selected  signals  to  demonstrate  the 
performance  of  bus  switch 

Top  trace:  Tnput  to  the  bus  switch 
Bottom  trace:  Output  of  the  bus  switch 


controllers  associated  with  this  cluster  and  the  other  clus¬ 
ter  where  the  requesting  SBC  resides  -  must  also  participate 
in  the  control  function.  However,  the  control  ultimately 
came  to  the  RPC  because  it  is  the  circuit  which  grants  the 
bus  usage  signal,  BPRN  (Bus  Priority  In),  One  RPC  is  used 
for  every  Multibus.  So  there  are  four  RPCs  in  each  star. 

The  behavior  of  our  RPC  will  be  described  by  four 
figures  using  the  BPRN  signals  (Bus  Priority  In)  of  the  SBCs 
requesting  the  bus.  A  BPRN  low  signal  means  the  SBC  has  been 
granted  the  bus  and  is  using  it. 

a.  Sharing  of  the  Multibus  by  Two  SBCs. 

Fig.  3.16  shows  BPRNs  of  two  SBCs.  The  bus  usage 
pattern  was  created  by  software.  Each  unit  of  low  BPRN  rep¬ 
resents  a  transfer  of  one  word.  If  there  is  no  request  of 
bus  usage  by  other  SBCs,  the  SBC  currently  using  the  bus  will 
hold,  as  shown  by  the  BPRN  low  signal  for  a  longer  period  of 
time.  The  figure  shows  the  interleaving  of  bus  usages  by 
these  two  SBCs,  indicating  that  the  RPC  works  rapidly  and 
efficiently  to  serve  these  two  SBCs. 

b.  Slow-Down  of  Bus  Release  Due  to  Refresh 
of  Dynamic  RAM 

However,  we  discovered  that  the  SBC  using  the 
bus  may  not  release  the  bus  after  its  one  word  of  transfer, 
as  shown  by  a  wide  gap  in  Fig.  3.17,  although  the  other  SBC 
was  requesting  the  bus.  We  discovered  that  this  is  the  na¬ 
ture  of  Intel's  8612  design.  When  the  dynamic  RAM  is  being 
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BPRN  of  SBC1 


BPRN  of  SBC2 


Figure  3. 16  Bus  Priority  In  signals  of  two  SBCs  to  demonstrate 
the  arbitration  of  their  usage  of  the  bus  by  the 
random  priority  controller 


Figure  5.17  Bus  Priority  In  signals  of  two  SBCs  to  demonstrate 
the  effect  of  dynamic  RAM  refresh  on  the  bus  usage 


BPRN 

BPRN 

1PRN 

3PRN 


of  SBC1 
of  SBC2 
of  SBC3 
of  SBC4 


Figure  3.18  Bus  Priority  In  signals  of  four  SBCs  to  demonstrate 
the  arbitration  of  their  usage  of  the  bus  by  the 
random  priority  controller 
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refreshed,  the  SBC  will  not  release  the  bus.  This  is  a 
drawback  we  cannot  do  anything  about  except  to  redesign  the 
8612  SBC. 

c.  Sharing  of  Multibus  by  Four  SBCs 

Fig.  3.21  shows  the  BPRN  signals  of  four  SBCs. 
Their  general  patterns  are  similar,  in  the  sense  that  there 
is  no  large  gap  in  any  one  of  these  traces  indicating  no  SBC 
is  dominating  the  bus  and  none  is  being ‘left  out  either. 

This  "uniform"  and  "equal"  treatment  of  all  SBCs  requesting 
the  bus  is  exactly  what  the  RPC  is  designed  to  do. 

d.  Behavior  of  RPC  When  the  Bus  is  Saturated 

We  prepared  the  most  severe  test  for  tfye  RPC  by 
programming  four  SBCs  requesting  the  bus  all  the  time.  Of 
course,  in  real  applications,  this  condition  should  never  be 
allowed  to  happen.  It  represents  very  poor  application  pro¬ 
gramming.  However,  it  is  a  tough  test  for  the  RPC.  Fig.  3.19 
shows  the  BPRN  of  four  SBCs.  The  interleaving  of  bus  usage 
is  no  different  from  the  previous  three  figures.  However, 
it  is  important  to  note  that  the  bus  was  first  shared  by  SBC1 
and  SBC3  for  12  transfers  and  then  shared  by  SBC2  and  SBC4 
for  another  12  transfers,  followed  by  the  repetition  of  such 
a  pattern.  Two  important  properties  caused  this  pattern. 
First,  the  RPC  is  designed  based  on  a  binary  tree  selection. 
Therefore,  only  two  SBCs  will  be  granted  first,  followed  by 
another  pair.  Second,  the  12  transfers  between  SBC1  and  SBC3 
are  determined  by  the  basic  design  of  the  8686  instruction 
queue  which  has  a  FIFO  queue  of  six  instructions. 


f 
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BPRN  of  SBC1 


BPRN  of  SBC2 
BPRN  of  SBC3 
BPRN  of  SBC4 


Figure  3.19  Bus  Priority  In  signals  of  four  SBCs  which 
request  the  bus  usage  1001  of  the  time  to 
demonstrate  the  function  of  random  priority 
controller 


Input  signal  to  a  bus 
switch 


Output  signal  waveform 
from  a  bus  switch 


Figure  3.20  Waveforms  of  input  and  output  signal  of  a 
bus  switch  to  demonstrate  the  operation 
of  the  switch 


Figure  3.21  Bus  Priority  In  signal  of  four  micro¬ 
computers  requesting  201  usage  of  the 
Multibus  to  demonstrate  the  operation 
of  the  random  priority  controller  in 
this  example  of  heavy  bus  requests 
(80%  bus  request) 


This  demonstration  clearly  indicated  that  our 
RPC  is  able  to  arbitrate  four  SBCs  under  the  most  demanding 
bus  contention  situation  which  should  never  be  allowed  to 
occur  in  real  application. 

4 .  Central  Controller 

The  function  of  the  central  controller  is  to  arbitrate 
requests  for  inter-cluster  and  inter-star  communication.  It 
works  jointly  with  the  distributed  controllers  to  search, 
select  and  synchronize  these  requests.  Although  there  is  only 
one  central  controller  for  a  star,  it  has  four  sections,  one 
for  each  cluster  in  the  star. 

The  important  components  of  each  section  in  the 
central  controller  are  CSRA  and/CSPE.  All  four  sections  are 
synchronized  by  two  clocks:  CLK1  for  the  searching  and  se¬ 
lecting  of  requests,  CLK2  for  their  synchronization. 

Two  figures  will  be  used  to  demonstrate  their  oper¬ 
ations  . 

a.  Searching/Selecting  Clock  (CLK1)  and 
Synchronization  Clock  (CLK2) 

These  two  clocks  are  the  heart  beats  of  the  inter¬ 
communication  network.  It  should  be  realized  that  CLK2  is 
not  independent  because  it  is  generated  from  CLK1.  Fig.  3.22 
shows  their  mutual  relationship.  The  third  trace  is  CLK1. 
Below  it  are  the  four-phase  CLK2  signals  for  four  clusters. 

It  is  important  to  note  that  there  is  no  overlap  among  them. 
This  is  to  avoid  any  undesirable  coincidence.  CLK1  is  at  a 
higher  clock  frequency  such  that  all  requests  from  other 


clusters  and  stars  are  searched  and  selected  at  adequate 
rates.  Once  a  request  is  selected,  it  is  synchronized  by 
CLK2  and  sent  on  to  the  appropriate  cluster. 

b.  Searching  and  Selection  of  Requests 

Fig.  3.23  shows  the  functions  of  CSRA  and  CSPE 
circuits  of  the  central  controller  A.  Four  signals  are  shown 
in  the  top  half  of  the  figure  representing  three  cluster 
requests  from  clusters  B,  C,  D  and  from  the  cluster  A  of 
another  star,  respectively.  The  lower  half  of  this  figure 
shows  the  cluster  or  star  grant  signals  to  another  star, 
cluster  D,  C  and  B,  respectively.  It  is  important  to  note 
that  these  CLPRN  (or  STPRN)  signals  do  not  overlap  although 
the  request  signals  do  overlap.  It  can  be  seen  that  cluster 
C  sent  its  CLREQ  first  and  got  its  CLPRN.  However,  cluster 
D  sent  its  CLREQ  before  cluster  C  finishes  its  request.  Such 
an  occasion  is  generally  not  allowed  in  real  application 
because  any  SBC  is  allowed  to  transfer  one  word  of  data  and 
must  release  the  bus  only  if  a  software  bus  lock  is  ordered. 
However,  this  test  is  to  challenge  the  ability  of  the  central 
controller.  In  this  case,  the  CSRA/CSPE  of  the  CCA  will  allow 
the  cluster  A  to  complete  its  request  period  and  then  award 
a  CLPRN  to  cluster  D.  This  figure  clearly  demonstrated  that 
with  a  mix  of  cluster  request  signals  from  three  clusters  and 
one  star,  some  with  overlap,  some  without  overlap,  the  central 
controller  is  able  to  take  in  these  requests,  sort  them  out, 
select  one  at  a  time  and  award  "cluster  grant"  appropriately. 


CLK1:  For  Searching 
and  Selection 


Figure  3.22  Two  Clocks  In  Central  Controller  For 

Searching/Selection  and  Synchronization 
of  Requests  From  Stars  and  Clusters 


Four  Request  Signals  To  CSRA: 

From  DCB 
From  DCC 
From  DCD  ,  . 

From  Star  A 


Four  Priority  In  Signals 
From  CSPE: 

To  Star  A 
To  DCD 
To  DCC 
To  DCB 


Figure  3.23  Demonstration  of  the  Functions  of  CSRA  and 
CSPE  Circuits  in  the  Central  Controller 
(Section  A  for  Cluster  A') 

Input  to  CSRA,  Output  from  CSPE 
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Of  course,  this  is  not  the  completion  of  the  intercommunica¬ 
tion  task.  The  CLPRN  will  be  sent  to  the  distributed  con¬ 
troller  to  initiate  further  control  actions  to  complete  the 
total  task  of  communication  between  two  SBCs. 

5.  Distributed  Controller 

The  function  of  the  distributed  controller  is  the 
same  as  that  of  the  central  controller.  They  must  work  with 
the  RPC  to  complete  the  intercommunication.  The  central 
controller  is  located  away  from  the  Multibus  and  also  controls 
the  operations  of  all  bus  switches.  The  distributed  control¬ 
ler  is  mounted  on  the  Multibus.  Therefore,  we  have  four 
distributed  controllers  in  a  star.  The  important  components 
of  each  distributed  controller  are: 

ICAAM  (Intra-cluster  advanced  activities  monitor) 

CIC  (Coincidence  inhibitor  circuit) 

DAC  (Deadlock  avoidance  circuit) 

Four  figures  will  be  used  to  demonstrate  their  operations. 

Eight  control  signals  in  the  distributed  controller  are  used 

in  these  figures. 

BREQ 

CLREQ* 

Internal/External  Signal 

Inhibit 

PRE 

BHD 

CLPRN 

BPRN 

The  first  and  eighth  control  signals,  BREQ  and  BPRN, 
are  two  of  the  most  important  ones  because  they  are  directly 
connected  to  the  SBCs.  We  must  remember  that  all  the  buses, 
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switches,  controllers  are  supporting  circuits  to  help  the 
SBCs  to  compute,  to  talk  among  themselves  efficiently.  The 
SBCs  are  the  originators  and  receivers  of  the  data  and  com¬ 
munication  and  control  signals. 

a.  Intra-Cluster  Communication 

Fig.  3.24  shows  the  sequence  of  events  in  a  test 
case  where  one  SBC  in  a  cluster  wants  to  talk  to  another  SBC 
in  the  same  cluster. 

It  can  be  seen  that  CLREQ*  (second  trace)  is  high, 
which  means  no  request  from  another  cluster.  CLPRN  (7th 
trace)  is  therefore  also  high,  i.e.,  no  cluster  priority 
signal  is  granted  by  the  central  controller. 

It  is  interesting  to  notice  the  small  delays 
between  BREQ,  PRE  and  BPRN. 

b.  Inter-Cluster/Intra-Star  Communication 

Fig.  3.25  shows  the  sequence  of  events  in  a  test 
case  where  an  SBC  in  one  cluster  wants  to  talk  to  an  SBC  in 
another  cluster  within  the  same  star. 

There  are  several  interesting  points  when  this 
case  is  compared  with  the  intra-cluster  case: 

0  Both  BREQ  and  CLREQ*  exist. 

0  Inhibit  signal  is  active  to  prevent  any  premature 
generation  of  BPRN. 

0  CLPRN  is  also  active  to  respond  to  the  CLREQ*. 

It  is  clearly  seen  that  this  inter-cluster 
communication  has  been  correctly  handled  by  the  distributed 
controller. 
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c.  Inter-Star  Communication 

Figure  3,26  shows  the  sequence  of  events  in  a 

test  case  where  an  SBC  in  one  cluster  of  a  star  wants  to  talk 

to  an  SBC  in  the  corresponding  cluster  of  a  neighboring  star. 

They  are  quite  similar  to  the  inter-cluster/intra-star  case 

in  Fig.  3.25  with  several  changes. 

The  second  trace  is  now  the  STREQ*  instead  of  the 
CLREQ*  signal. 

The  seventh  trace  is  now  the  STPRN  signal  instead 
of  the  CLPKN  signal. 

The  rest  of  the  signals  behave  quite  similarly.  It  shows 
that  requests  from  a  cluster  in  the  same  star  and  from  a 
neighboring  star  are  treated  quite  the  same. 


228 


IV-  IMPLEMENTATION  OF  ADAPTIVE  FILTER 
ON  MULTIPLE  MICROCOMPUTER  SYSTEM 


A.  INTRODUCTION 

1.  Selection  of  Microcomputer 

The  goal  of  this  thesis  research  was  to  eliminate 
the  gap  between  the  theoretical  development  of  image  process¬ 
ing  algorithms  and  the  experimental  development  of  their 
implementation  on  some  processor  systems  which  are  good  can¬ 
didates  for  practical  applications. 

In  this  thesis,  a  multiple  microcomputer  system  was 
chosen  as  the  processor  system  candidate. 

It  should  be  recognized  that  only  during  the  past 
two  to  three  years  have  16  bit  microcomputers  been  seriously 
considered  for  signal  processing  implementations.  Although 
8  bit  microcomputers  have  been  investigated  for  performing 
signal  processing  operations,  the  motivations  of  these  stud¬ 
ies  are  mainly  to  explore  what  can  the  8  bit  microcomputers 
do  for  signal  processing.  For  serious  implementations,  bit 
slice  microprocessors  have  always  been  the  favored  approach 
which  can  be  designed  to  emulate  16  bit,  32  bit  or  even 
longer  word  computers.  However,  16  bit  microcomputers  are 
being  supported  with  more  and  more  powerful  hardware  and 
software  and  are  approaching  low-end  minicomputer  performance. 

To  examine  the  signal  processing  performance  of 
today's  16  bit  MOS  microcomputer,  we  coded  the  statistical 


3x3  spatial  filter  on  one  main  frame  computer,  IBM  360/67 
and  two  16  bit  microcomputers,  DEC  LSI-11  and  Intel  8612, 
using  high  order  programming  languages  and  single  precision 
numerical  data  format.  Fortran  is  used  for  the  IBM  and  DEC 
computers.  PLM86  is  used  for  the  Intel  computer.  The  exe¬ 
cution  times  expressed  in  seconds  are  shown  in  Table  IV. 1 
for  comparison. 

TABLE  IV. 1 

IMAGE  PROCESSING  EXECUTION  TIME 
(in  seconds) 


Image  Processing  Operations 

IBM  360/67 

DEC  LSI-11 

Intel  8612 

Fortran 

Fortran 

PLM86 

Macro 

Single 

Precision 

Single 

Precision 

Single 

Precision 

Integer 

Spatial  Statistics  Calculation 

4.07 

25.46 

334.25 

0.72 

Spatial  Filter  Design 

0.0047 

0.24 

2.82 

Perform  Spatial  Filter 

0.98 

5.62 

79.8 

0.47 

It  can  be  seen  that  LSI-11  has  better  floating  point  compu¬ 
tation  support  today  than  Intel's  8612  which  took  13  to  14  times 
longer  than  the  LSI-11  to  perform  these  image  processing  oper¬ 
ations.  The  LSI-11  itself  took  approximately  6  times  longer  than 
the  IBM  360/6.7.  -  Based  on  this  comparison,  the  LSI -11  should 

be  chosen  as  the  16  bit  microcomputer  candidate.  However, 
Intel's  8612  was  selected  because  of  its  larger  physical 
memory  addressing  space  and  its  system  Multibus  support  which 
are  much  better  suited  for  multiple  microcomputer  system 
development . 
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Further,  two  of  the  three  spatial  filter  modules  were 
coded  in  assembly  language  and  a  32  bit  integer  data  format 
on  the  8612.  It  was  found  that  the  execution  times  are  quite 
short,  suggesting  that  even  today's  Intel  16  bit  microcomputer, 
without  the  assistance  of  hardware  arithmatic  devices,  can 
perform  these  rather  sophisticated  image  processing  operations 
very  well  if  compared  with  the  main  frame  computer  IBM  360/67. 
More  specifically,  it  took  0.72  seconds  to  compute  the  auto¬ 
correlation  matrix  elements  for  the  3x3  spatial  filter, 
averaged  over  the  32  x  32  image,  and  0.47  seconds  to  perform 
this  3x3  spatial  filtering  over  the  image. 

2 .  Implementation 

In  this  chapter  we  will  present  the  implementation 
results  of  our  adaptive  filter  on  the  multiple  microcomputer 
system.  In  Section  B,  the  performance  of  spatial  filters  is 
discussed.  In  Section  C,  the  performance  of  adaptive  spatial 
filters  will  be  discussed. 

The  functions  of  various  components  of  the  intercon¬ 
nections  and  communication  controllers  have  been  described  in 
previous  sections  using  mainly  signals  generated  by  function 
generators.  In  this  section,  a  test  program  was  used  to  test 
and  evaluate  the  data  transfer  behaviors  of  the  system.  This 
program  is  quite  straightforward  and  fetches  data  from  the  RAM 
and  displays  them  on  a  CRT  terminal.  However,  the  locations 
of  the  program  and  data  are  at  different  parts  of  the  system 
to  provide  a  thorough  test  of  the  data  transfer  and  bus 
arbitration  behaviors. 


Three  tests  were  made. 


The  objectives  of  the  first  two  tests  are  to  measure 
the  maximum  rate  of  data  transfer  on  the  system  bus.  For 
this  purpose,  both  the  program  and  data  were  stored  either 
in  the  global  RAM  located  in  another  slave  SBC,  as  in  test 
case  1,  or  in  the  global  RAM  located  in  the  yPRO  RAM  board. 
Therefore,  the  system  bus  was  used  very  busily  because  not 
only  the  data  must  be  fetched  via  the  bus,  the  program  itself 
must  be  read  from  the  memory  external  to  the  testing  SBC. 

TABLE  IV. 2 

MEMORY  ALLOCATION  FOR  MULTIBUS  TEST 


Test  No. 

Location  of 
Program 

Location  of 
Data 

Remarks 

•  « 

1 

Slave  SBC 

Slave  SBC 

Program  and  data 
being  run  at  maxi- 

2 

pPRO  RAM 

pPRO  RAM 

mum  rate. 

3 

Master  SBC 

UPRO  RAM 

Program  and  data 
being  run  at  approx¬ 
imately  20%  of  the 
maximum  rate. 

The  maximum  rates  at  which  this  test  can  run  with 
one  to  six  microcomputers  are  shown  in  Table  IV. 3.  Several 
important  facts  can  be  noticed. 

Cl)  The  bus  transfer  rate  of  each  SBC  is  reduced 
when  more  and  more  SBCs  want  to  use  the  bus,  as  it  should  be. 

(2)  However,  the  maximum  rate  and  amount  of  reduc¬ 
tion  vary  from  test  to  test.  For  example,  in  test  1,  we 
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were  able  to  transfer  710  Kbyte/sec  at  its  maximum  if  only 
one  SBC  is  using  the  bus  as  compared  with  a  maximum  of  911 
Kbyte/sec  rate  for  one  SBC  in  test  case  2.  Test  2  showed 
that  it  is  quicker  to  get  data  out  of  the  pPRO  than  the  RAM 
on  a  different  SBC.  This  can  be  explained  easily  because 
control  on  the  SBC  must  decide  whether  the  memory  addressed 
is  on-board  or  off-board.  This  decision  takes  time,  thus  it 
slows  down  the  transfer  rate.  When  more  SBCs  were  added  in 
these  two  tests,  the  transfer  rate  of  every  SBC  was  decreased. 
However,  the  rates  of  decrease  were  different  in  Test  1  and 
Test  2  as  shown  in  Table  IV. 3.  They  are  also  plotted  in  Fig. 
4.1  to  give  a  graphical  view.  It  is  obvious  that  substantial 
deteriorations  of  the  bus  transfer  rate  took  place  in  these 
two  cases,  from  710  Kbyte/sec  to  144  Kbyte/sec  in  Test  1  and 
from  911  to  167.1  Kbyte/sec  in  Test  2. 

(3)  It  should  be  pointed  out  that  such  heavy 
usage  of  the  system  bus  should  be  allowed  to  happen  only 
during  tests.  If  a  programmer  prepared  an  application  pro¬ 
gram  with  such  heavy  bus  usage,  he  has  failed  miserably  in 
partitioning  his  program  for  parallel  and  pipeline  computa¬ 
tion  in  the  multiple  microcomputer  system. 

(4)  Therefore,  to  provide  a  test  more  compat¬ 
ible  with  real  operational  conditions,  Test  3  was  prepared 
which  has  its  program  in  the  RAM  of  the  master  SBC  and  its 
data  in  the  global  RAM  in  yPRO.  Further,  it  was  run  at  a  rate 
of  194.9  Kbyte/sec  on  the  bus  when  only  one  SBC  requested 
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by  several  microcomputers 


the  bus.  It  can  be  seen  that  the  deterioration  of  the  system 
bus  transfer  rate  is  much  more  moderate,  from  194.9  for  one 
SBC  to  132  Kbyte/sec  for  six  SBCs.  This  is  a' testimony  of 
the  ability  of  the  intercommunication  controller  in  treating 
all  SBCs  equally  without  allowing  any  one  SBC  to  dominate  the 
bus  usage. 


TABLE  IV. 3 

SYSTEM  BUS  TRANSFER  RATE  (Kbyte/ sec)  FOR  EVERY  SBC  IN 
THREE  MULTIPLE  MICROCOMPUTER  SYSTEM  TESTS 


No.  of  SBCs 

Test  1 

Test  2 

Test  3 

1 

710 

911 

194.9 

2 

400.7 

522 

188 

3 

277.7 

345.33 

184 

4 

212 

255.7 

166 

5 

171.8 

202.3 

147.9 

6 

144 

167.1 

132 

(5)  Further,  the  overhead  loss  of  transfer  rate 
in  arbitrating  the  bus  usage  of  several  microcomputers  is 
small.  Let  us  consider  Test  Case  2.  The  maximum  bus  trans¬ 
fer  rate  took  place  when  there  were  two  SBCs  using  the  bus 
and  was  2  x  522  ■  1044  Kbyte/sec.  When  six  SBCs  were  using 
the  bus,  the  total  transfer  rate  on  the  bus  was  6x167.1  * 
1002.6  Kbyte/sec.  The  loss  is  only  (1044  -  1002 . 6) /1044  * 
0.0397,  or  3.97%.  Of  course,  each  SBC  suffered  a  loss  of 
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£911  -  167 . 1) / 91 1  =  81.658%  in  its  bus  usage  rate.  It  is 
interesting  to  note  that  167.1  KBS  for  six  SBCs  is  close  to 
one-sixth  of  the  rate  of  911  KBS  if  one  SBC  has  the  system 
all  to  itself. 

B.  IMPLEMENTATION  OF  3 x  3  SPATIAL  FILTERING  ON 
MULTIPLE  MICROCOMPUTER  SYSTEM 

1 .  Introduction 

Four  different  implementations  were  compared. 

They  differed  in  the  manner  of  storing  the  programs,  variables 

and  data  in  various  parts  of  the  memory  hierarchy  and  some 

programming  skills.  For  this  development,  all  program  and 

data  were  stored  in  RAM  on  the  single  board  microcomputers. 

These  RAM  have  been  separated  into  two  types: 

0  Unshared  RAM:  They  are  "private"  to  the  microcomputer 
where  the  RAM  is  located. 

°  Shared  RAM:  They  are  "global"  and  can  be  accessed 
by  other  microcomputers  on  the  same  Multibus. 

TABLE  IV. 4 

PROGRAM  DATA  AND  VARIABLE  ALLOCATION 


Implementation 


Program 


Variables 


Data 


Case  1 

Ideal 

Case 

Case  2* 

Unshared 

Unshared 

Shared 

Case  3 

Unshared 

Unshared 

Shared 

Case  4 

Unshared 

Shared 

Shared 

Case  5 

Shared 

Shared 

Shared 
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The  results  are  presented  in  Fig.  4.2  which  expresses  the 
number  of  frames  which  can  be  performed  on  the  3x3  spatial 
filtering  task  per  second  as  a  function  of  the  number  of 
microcomputers  used  to  partition  the  spatial  filtering  into 
parallel  operations.  It  should  be  pointed  out  that  the  image 
size  is  30  x  30  pixels.  The  partitioning  is  to  split  the 
image  into  equal  parts  for  several  microcomputers. 

The  results  will  be  discussed  in  the  following. 

a.  The  first  case  is  not  a  measured  result.  It 
represents  the  ideal  enhancement  of  computation  by  using 
multiple  microcomputers.  We  first  measured  the  execution 
speed  of  performing  a  spatial  filter  over  the  whole  image 
by  one  microcomputer  with  program,  variables  and  data  all 
in  the  private  unshared  RAM  of  the  SBC.  There  was  no  bus 
usage,  therefore  no  overhead  due  to  bus  communication.  The 
maximum  filtering  speed  is  roughly  two  thousand  pixels  pro¬ 
cessed  by  this  spatial  filter  per  second.  For  more  SBCs, 
we  simply  multiply  the  rate  by  the  number  of  microcomputers 
and  plotted  a  "linear  enhancement"  curve.  This  represents 
the  ideal  case  and  serves  as  the  goal  for  our  partitioning 
to  approach. 

b.  Let  us  start  with  the  case  of  lowest  performance, 
Case  5.  In  this  case,  all  program,  variables  and  data  were 
located  in  the  shared  memory  of  another  SBC,  It  obviously 
required  the  maximum  amount  of  transfer  and  system  bus  usage. 
It  can  be  seen  that  the  performance  saturated  quite  quickly. 
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Fie-  4-2  Performance  of  the  Partitioning  of  a  Spatial  Filter 

In  a  Multiple  Microcomputer  System  (Parallel  Processing) 


We  are  obviously  wasting  the  computational  power  of  added 
microcomputers . 

c.  Next,  in  Case  4,  where  the  program  was  stored 

in  the  private  memory  of  the  computing  SBC,  but  the  variables 
and  data  were  stored  in  the  global  memory  of  another  SBC. 

The  throughput  performance  improved  almost  linearly  with 
respect  to  the  number  of  microcomputers  but  at  a  rate  lower 
than  the  "ideal  linear  enhancement." 

d.  In  Case  3,  both  the  program  and  variables  were 
stored  in  the  unshared  private  RAM.  But  the  data  were  stored 
in  the  global  RAM  of  another  SBC.  Further  improvement  was 
accomplished.  However,  about  201  of  the  computing  capability 
was  lost  because  of  the  overhead  for  the  arbitration  of  mul¬ 
tiple  microcomputer  requests. 

e.  In  Case  2,  the  locations  of  the  program,  varia¬ 
bles  and  data  are  the  same  as  in  Case  3,  but  the  programming 
is  more  clever  in  the  sense  that  the  number  of  accesses  to 

the  system  bus  by  each  microcomputer  is  minimized  and,  further, 
the  occurrences  of  these  system  bus  accesses  were  distributed 
as  evenly  in  time  as  possible.  It  can  be  seen  that  the  en¬ 
hancement  of  total  computing  power  is  much  closer  to  the  total 
"ideal  linear  enhancement"  case. 

f.  In  summary,  we  have  used  the  special  case  of  spa¬ 
tial  filtering  to  explore  the  behavior  and  improvement  of 
computing  by  the  multiple  microcomputer  system.  It  should 

be  pointed  out  that  although  there  have  been  a  lot  of  ideas 
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in  this  field,  real  experience  is  still  very  limited.  Con¬ 
sequently,  there  is  really  no  concensus  in  the  philosophy, 
approaches  and  methodologies  of  effective  partitioning  for 
parallel  and  pipeline  computing.  This  thesis  is  a  first  step 
in  testing  the  uncharted  water.  We  only  used  a  spatial  filter 
to  test  the  parallel  processing.  We  have  not  used  a  problem 
to  test  pipeline  processing  and  combined  parallel/pipeline 
processing  yet.  Therefore,  we  do  not  intend  to  declare  that 
the  experience  learned  from  this  spatial  filtering  established 
a  general  methodology  for  effective  partitioning. 

But  we  feel  that  the  following  guidelines  proba¬ 
bly  will  be  helpful  when  more  complex  problems  will  be  tested 
to  develop  a  more  thorough  philosophy  of  partitioning: 

a)  The  bus  usage  should  be  minimized. 

b)  The  bus  usage  should  be  distributed  more  evenly 
in  time.  Concentration  of  bus  usage  should  be 
avoided. 

g.  Meanwhile,  it  should  be  pointed  out  that  this 
implementation  of  spatial  filtering  is  a  test  case  based  on 
a  real  computation  problem.  In  addition  to  the  experience 
learned  for  partitioning,  the  successful  implementation  of 
the  spatial  filtering  involving  up  to  five  microcomputers  in 
parallel  processing  convincingly  proved  that  the  random 
priority  is  working  correctly. 
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V.  CONCLUSION  AND  RECOMMENDATIONS 


A.  CONCLUSION 

1.  Motivation 

This  thesis  was  motivated  by  the  needs  of  new  smart 
sensor  developments.  With  the  anticipation  of  new  sensitive 
and  large  mosaic  optical  sensor  arrays  and  very  sophisticated 
signal/data  processing  capabilities  to  be  offered  by  VLSI/ 
VHSIC  electronics,  very  ambitious  mission  objectives  of  new 
surveillance,  search/track  and  weapon  guidance  systems  are 
being  proposed  and  developed,  which  require  new  signal  pro¬ 
cessing  techniques  to  accomplish  demanding  goals.  Further, 
they  require  very  sophisticated  processor  systems  which  are 
powerful  enough  to  implement  the  new  signal  processing 
algorithms  and  also  small  and  light  enough  for  mounting  on 
platforms  of  practical  systems. 

2.  Single  Objective  and  Dual  Tasks 
This  thesis  has  one  single  objective,  to  help  to 

make  the  new  "smart  sensors"  practical,  but  consists  of  two 
tasks  to  achieve  this  objective. 

a.  Develop  new  adaptive  filter  techniques  to  process 
infrared  images , for  enhancement  of  "target  signal" 
to  "background  clutter  noise"  ratio. 


b.  Develop  a  new  multiple  microcomputer  system  to 
implement  this  type  of  image  processing. 


3.  Extensions  and  Contributions 


Both  studies,  although  motivated  by  the  development 
of  "infrared  smart  sensors,"  are  generic  and  can  contribute 
to  broader  fields  much  beyond  the  image  processing  problems 
in  infrared  smart  sensor  systems. 

4.  Results  I  -  Adaptive  Filters 

The  following  results  have  been  obtained: 

a.  Adaptive  filter  research  done  in  the  past  was 
surveyed.  It  was  found  that: 

°  Practically  all  past  research  dealt  with  one  dimen¬ 
sional  problems,  except  one  by  B.  Evenor  who  extended 
the  LMS  algorithm  to  images  generated  by  Markov  models. 

°  Most  approaches  are  based  on  LMS  algorithms. 

b.  In  this  thesis  the  LMS  algorithm  was  extended  to 
process  real  world  infrared  images. 

c.  A  new  approach  to  nonrecursive  adaptive  filters 
was  developed  which  is  similar  to  searching  for  the  extreme 
point  in  optimization  problems. 

d.  Two  optimization  criteria  were  considered: 

mMSE  =  minimization  of  mean  square  error 
MSNR  *  maximization  of  signal  to  noise  ratio. 

e.  Seven  different  optimization/searching  techniques 
were  developed: 

°  Gradient  approaches  *  \  Steepest  descent 

\  Accelerated  steepest  descent 
Amir’s  method  (mMSE  only) 

0  Conjugate  gradient  approaches  *  J  Fletcher-Reeves 

|  Pollack 

°  Variable  metric  approach  -  Davidon-Fletcher-Powell 
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Amir's  transform  approach  (MSNR  only) 


f.  These  approaches  were  tested  on  two  infrared  test 


images : 

0  Indiana  -  Blue  spike  band  infrared  image  appropriate 
for  high  altitude  downward  looking  infrared  sensor 
systems. 

0  China  Lake  -  10-13  micron  thermal  band  infrared  image 
appropriate  for  shorter  distance  side-looking  infrared 
sensor  systems. 

The  results  are  encouraging  and  showed  that  these  new 
adaptive  filters  are  effective  in  suppressing  background  clutter 
and  enhancing  the  "target  signal"  to  "clutter  noise  ratio." 

5.  Results  II  -  Multiple  Microcomputer  System 

a.  The  tightly-coupled  multiple  microcomputer  research 
done  in  the  past  was  surveyed.  It  was  found  that: 

°  There  are  many  conceptual  designs  of  new  multiple 
microcomputer  systems.  Only  a  very  ’small  number  of 
these  have  embarked  on  actual  developments  with  both 
hardware  and  software  efforts. 

°  More  loosely  coupled  multiple  microcomputer  systems 

are  being  developed.  They  are  mostly  computer  networks. 

°  There  are  only  two  tightly  coupled  multiple  micro¬ 
computer  systems  in  operation  today  based  on  the 
survey  of  the  open  literature.  Both  are  at  Carnegie 
Mellon  University:  Cmmp  and  Cm*.  It  should  be  noted 
that  although  Cmmp  is  a  multiple  minicomputer  system, 
today's  16  bit  microcomputers  are  fast  approaching 
minicomputer  performance. 

b.  Based  on  an  intensive  consideration  of  the  re¬ 
quirements  of  typical  new  smart  sensor  systems  in  not  only 
the  mission  signal  processing  area  but  also  in  management, 
control,  and  communication  areas,  it  was  decided  that  a 
hierarchical  architecture  which  supports  simultaneous  tightly 
and  loosely  coupled  systems  is  attractive. 


c.  A  multiple  star,  multiple  cluster  architecture 
using  commercially  developed  16  bit  microcomputers  was 
developed.  A  complete  star  bus  switch  network  was  developed 
which  is  managed  by  a  control  system  consisting  of  three 
levels  of  control:  random  priority  controller,  distributed 
controller,  central  controller. 

d.  The  basic  concept  of  this  hardware  architecture 
has  been  basically  tested  by  simulated  intercommunications. 
Extensive  tests  in  real  signal/data  processing  environments 
are  awaiting  the  successful  developments  of  operating  systems. 

6 .  ResultsIII  -  Implementation  of  Adaptive  Spatial 
Filters  on  Microcomputers  and  Multiple 
Microcomputer  Systems 

a.  The  spatial  filter  program  was  coded  for  one 
main  frame,  the  IBM  360-67,  and  two  16  bit  microcomputers: 
the  DEC  LSI-11  and  one  Intel  8612.  The  DEC  LSI-11  has  more 
mature  floating  point  mathematics  software  and  a  hardware 
arithmetic  IC  chip,  but  is  not  as  well  suited  for  multiple 
microcomputer  system  development  as  the  Intel  8612,  whose 
floating  point  software  is  still  very  primitive.  However, 
when  coded  in  assembly  language,  the  Intel  8612  performs 
the  spatial  filtering  faster  than  the  main  frame  coded  in 
high  order  language. 

b.  Implemented  by  using  only  one  16  bit  8612  micro¬ 
computer,  the  computation  times  for  the  3x3  spatial  filter 


and  a  32  x  32  image  have  been  measured  as  follows: 

Spatial  statistics  computation  =  0.72  sec. 

Adaptive  spatial  filter  design  =  1.0  sec. 

(.Conjugate  gradient  Pollack  method) 

Perform  spatial  filtering  =  0.47  sec. 

c.  Several  ways  of  using  the  multiple  microcomputer 
implementation  by  placing  program,  variables  and  data  in  the 
unshared  private  RAM  and/or  the  shared  global  RAM  have  been 
investigated. 

It  was  found  that  the  best  enhancement  of  total 
execution  speed  of  the  spatial  filtering  is  to  use  more  micro¬ 
computers  by  storing  the  program  and  variables  in  the  private 
RAM  and  the  data  in  the  global  RAM.  The  image  data  is  not 
moved  into  the  microcomputer  all  at  once.  Instead,  the  data 
is  moved,  one  at  a  time,  into  the  private  RAM  of  the  micro¬ 
computer  only  moments  before  it  is  needed  for  processing. 

B.  RECOMMENDATION 

1 .  General 

Both  topics  covered  in  this  thesis  are  quite  new. 

This  research  only  opens  the  gate  a  little  into  two  fields 
worthy  of  more  investigations.  Although  this  thesis  is  con¬ 
cerned  mainly  with  the  image  processing  developments  and 
their  implementations  for  infrared  smart  sensors,  the  tech¬ 
niques  developed  are  generic  and  can  be  applied  to  much 
broader  fields  beyond  smart  sensors. 
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2 .  Adaptive  Filters 

The  new  techniques  based  on  the  concepts  of  gradient, 
optimization  search  can  be  applied  to  most  of  the  adaptive 
filter  research  done  in  the  past  using  the  LMS  algorithm. 

For  adaptive  image  processing  applications,  they 
should  be  used  to  develop  adaptive  temporal  filters  if  a 
series  of  successive  frames  of  images  are  rather  well  regis¬ 
tered  spatially  from  frame  to  frame,  although  there  may  be 
drift,  jitter,  rotations,  etc.  between  frames. 

Testing  of  these  adaptive  filters  using  more  challeng¬ 
ing  real  world  images  which  have  serious  non-stationarity 
should  be  performed  to  give  the  adaptive  filtering  techniques 
some  tough  challenges.  Jamming  and  interference  noises  should 
be  considered.  The  convergence  time  of  the  compiled  adaptive 
filter  programs  should  be  measured  to  obtain  relative  speed 
of  convergence  of  all  the  adaptation  methods.  Adaptive  fil¬ 
ters  for  extended  targets  should  be  developed. 

3.  Multiple  Microcomputer  System 

Although  the  subject  of  multiple  microcomputer  systems 
is  not  new,  there  are  many  unresolved  questions  that  have 
hardly  been  touched  because  of  the  extensive  effort  required 
to  make  any  type  of  multiple  microcomputer  system  operational. 
Only  two  such  systems  are  known  to  be  working  today,  Cmmp  and 
Cm*,  although  many  system  architectures  have  been  proposed 
and  conceptualized.  A  small  number  of  these  have  been  simu¬ 
lated.  A  smaller  number  of  them  are  being  emulated.  An  even 
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smaller  number  of  them  are  being  built.  Simulations  and 
modeling  used  today  for  multiple  microcomputer  systems  must 
be  carefully  and  critically  scrutinized  for  their  validity 
and  usefulness.  It  is  extremely  important  to  examine  how 
the  intercommunication  overhead  is  modeled  and  simulated. 

There  is  very  little  first-hand  experience  in  existence  today. 

Therefore,  a  wide  variety  of  problems  associated  with 
the  new  multiple  microcomputer  systems  must  be  researched, 
examined  and  answered. 

This  thesis  contributed  to  the  formulation,  design,  fab¬ 
rication  and  test  of  a  multiple  microcomputer  system  which 
can  be  used  - 

1.  Not  only  for  developing  effective  ways  of  implementing 
smart  sensor  image  processing,  in  general,  and  the  adaptive 
image  processing,  in  particular, 

2.  But  also  as  a  test  bed  to  develop,  verify,  and  improve 
several  basic  issues  of  multiple  microcomputer  systems.  In¬ 
cluded  were  considerations  of : 

a.  Effective  and  alternative  intercommunication  for 
combined  tightly  and  loosely  coupled  systems. 

b.  Effective  and  alternative  operating  systems  for 
real  time  signal  processing,  multi-tasking,  multi-users, 
security,  dynamic  reconfiguration  and  fault  tolerance. 

c.  Effective  and  alternative  programming  methodologies 
for  partitioning  a  given  problem  into  a  number  of  modules  suit¬ 
able  for  combined  pipeline  and  parallel  implementation  on 
multiple  microcomputer  systems. 


d.  Effective  and  alternative  ways  of  using  the  dis¬ 
tributed  capabilities  of  multiple  microcomputer  systems  for 
fault  tolerance,  self-maintenance  error  recovery. 
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